g++: Use ZIP files as input - c++

We have the Boost library in our side. It consists of a huge number of files which never change and only a tiny portion of it is used. We swap the whole boost directory if we are changing versions. Currently we have the Boost sources in our SVN, file by file which makes the checkout operations very slow, especially on Windows.
It would be nice if there were a notation / plugin to address C++ files inside ZIP files, something like:
// #ZIPFS ASSIGN 'boost' 'boost.zip/boost'
#include <boost/smart_ptr/shared_ptr.hpp>
Are there any support for compiler hooks in g++? Are there any effort regarding ZIP support? Other ideas?

I assume that make or a similar buildsystem is involved in the process of building your software. I'd put the zip file in the repository, and add a rule to the Makefile to extract it before the actual build starts.
For example, suppose your zip file is in the source tree at "external/boost.zip", and it shall be extracted to "external/boost", and it contains at its toplevel a file "boost_version.h".
# external/Makefile
unpack_boost: boost/boost_version.h
boost/boost_version.h: boost.zip
unzip $<
I don't know the exact syntax of the unzip call, ask your manpage about this.
Then in other Makefiles, you can let your source files depend on the unpack_boost target in order to have make unpack Boost before a source file is compiled.
# src/Makefile (excerpt)
unpack_boost:
make -C ../external unpack_boost
source_file.cpp: unpack_boost
If you're using a Makefile generator (or an entirely different buildsystem), please check the documentation for these programs for how to create something like the custom target unpack_boost. For example, in CMake, you can use the add_custom_command directive.
The fine print: The boost/boost_version.h file is not strictly necessary for the Makefile to work. You could just put the unzip command into the unpack_boost target, but then the target would effectively be phony, that is: it would be executed during each build. The file inbetween (which of course you need to replace by a file which is actually present in the zip archive) ensures that unzip only runs if necessary.

A year ago I was in the same position as you. We kept our source in SVN and, even worse, included boost in the same repository (same branch) as our own code. Trying to work on multiple branches was impossible, as it would take most of a day to check-out a fresh working copy. Moving boost into a separate vendor repository helped, but it would still take hours to check-out.
I switched the team over to git. To give you an idea of how much better it is than SVN, I have just created a repository containing the boost 1.45.0 release, then cloned it over the network. (Cloning copies all of the repository history, which in this case is a single commit, and creates a working copy.)
That clone took six minutes.
In the first six seconds a compressed copy of the repository was copied to my machine. The rest of the time was spent writing all of those tiny files.
I heartily recommend that you try git. The learning curve is steep, but I doubt you'll get much pre-compiler hacking done in the time it would take to clone a copy of boost.

We've been facing similar issues in our company. Managing boost versions in build environments is never going to be easy. With 10+ developers, all coding on their own system(s), you will need some kind of automation.
First, I don't think it's good idea to store copies of big libraries like boost in SVN or any SCM system for that matter, that's not what those systems are designed for, except if you plan to modify code in boost yourself. But let's assume you're not doing that.
Here's how we manage it now, after trying lots of different methods, this works best for us.
For every version of boost that we use, we put the whole tree (unzipped) on a file server and we add extra subdirectories, one for each architecture/compiler-combination, where we put the compiled libraries.
We keep copies of these trees on every build system and in the global system environment we add variables like:
BOOST_1_48=C:\boost\1.48 # Windows environment var
or
BOOST_1_48=/usr/local/boost/1.48 # Linux environment var, e.g. in /etc/profile.d/boost.sh
This directory contains the boost tree (boost/*.hpp) and the added precompiled libs (e.g. lib/win/x64/msvc2010/libboost_system*.lib, ...)
All build configurations (vs solutions, vs property files, gnu makefiles, ...) define an internal variable, importing the environment vars, like:
BOOSTROOT=$(BOOST_1_48) # e.g. in a Makefile, or an included Makefile
and further build rules all use the BOOSTROOT setting for defining include paths and library search paths, e.g.
CXXFLAGS += -I$(BOOSTROOT)
LFLAGS += -L$(BOOSTROOT)/lib/linux/x64/ubuntu/precise
LFLAGS += -lboost_date_time
The reason for keeping local copies of boost is compilation speed. It takes up quite a bit of disk space, especially the compiled libs, but storage is cheap and a developer losing lots of time compiling code is not. Plus, this only needs to be copied once.
The reason for using global environment vars is that build configurations are transferrable from one system to another, and can thus be safely checked in to your SCM system.
To smoothen things a bit, we've developed a little tool that takes care of the copying and setting the global environment. With a CLI, this can even be included in the build process.
Different working environments mean different rules and cultures, but believe me, we've tried lots of things and finally, we decided to define some kind of convention. Maybe ours can inspire you...

This is something you would not do in g++, because any other application that wants to do it would also have to be modified.
Store the files on a compressed filesystem. Then every application gets the benefit automatically.

It should be possible in an OS to allow transparent access to files inside a ZIP file. I know that I put it in the design of my own OS a long time ago (2004 or so) but never got it to a point where it was usable. The downside is that seeking backwards in a file inside a ZIP is slower as it's compressed (and you can't rewind the compressor state, so you have to seek from the start instead). This also makes using a zip-inside-a-zip slow for rewinding and reading. Fortunately, most cases just read a file sequentially.
It should also be retrofittable to current OSes, at least in client space. You can hook the filesystem access functions used (fopen, open, ...) and add a set of virtual file descriptors that your own software would return for a given filename. If it's a real file just pass it on, if it's not open the underlying file (possibly again via this very function) and pass a virtual handle. When accessing the file contents, read directly from the zip file without caching.
On Linux you would use an LD_PRELOAD to inject it into existing software (at usage time), on Windows you can hook the system calls or inject a DLL into the space of software to hook the same functions.
Does anybody know if this already exists? I can't see any clear reason it wouldn't...

Related

set output path for cmake generated files

My question is the following:
Is there a way to tell CMakeFiles where to generate it's makefiles, such as cmake_install.cmake, CMakeCache.txt etc.?
More specifically, is there a way to set some commands in the CMakeFiles that specifies where to output these generated files? I have tried to search around the web to find some answers, and most people say there's no explicit way of doing this, while others say I might be able to, using custom commands. Sadly, I'm not very strong in cmake, so I couldn't figure this out.
I'm currently using the CLion IDE and there you can specifically set the output path through the settings, but for flexibility reasons I would like as much as possible to be done through the CMakeFiles such that compiling from different computers isn't that big of a hassle.
I would also like to avoid explicitly adding additional command line arguments etc.
I hope someone might have an answer for me, thanks in advance!
You can't (easily) do this and you shouldn't try to do it.
The build tree is CMake's territory. It allows you some tiny amount of customization there (for instance you can specify where the final build artifacts will be placed through the *_OUTPUT_DIRECTORY target properties), but it does not give you any direct control over where intermediate files, like object files or internal make scripts used for bookkeeping are being placed.
This is a feature. You have no idea how all the build systems supported by CMake work internally. Maybe you can move that internal file to a different location in your build process, which is based on Unix Makefiles. But maybe that will also horribly break my build process, which is using Visual Studio. The bottom line is: You shouldn't have to care about this. CMake should take care of it, and by taking some freedom away from you, it ensures that it can actually do that job on all supported build toolchains.
But this might still be an unsatisfactory answer to you. You're the developer, shouldn't you be in full control of the results produced by your build? Of course you should, which is why CMake again grants you full control over what goes into the install tree. That is, whatever ends up in the install directory when you call make install (or whatever is the equivalent of installing in your build toolchain) is again under your control.
So you do control everything that matters: The source tree, the install tree, and that tiny portion of the build tree where the final build artifacts go. The rest of the build tree is off-limits for you and for good reasons.

Is there a build system does not use timestamp to check file change

Build systems like make use timestamp check if a dependency is changed during two build. Here are a few common issue I run into with timestamp
Open a file, make some change, but later, I decided it is not good. I revert the change, for example, git checkout -- file if I am using git for the project.
Open a file, I just accidentally hit keyboard shortcut for save of the editor
Either way, the file's timestamp is changed. If now I want to build the project, everything depending on that file need to be rebuilt. This often means the whole project.
Is there anyway around these issues? For example, a build system using a version control for checking file change, preferably git. Or any other solutions to the above issues are welcome.
Many thanks in advance.
SCons uses checksums, not timestamps by default. However, checksumming requires reading all the contents of all the source files on disk, and that is much slower than simply reading directory entries, which is why most build systems use timestamps.
Software Build Systems gives a good overview of these issues.

Organizing solutions, projects and SVN

I would like some help in setting up a project in SVN with regards to directory structure. I have read several answers regarding this on SO, but as I am new to this, most of them are difficult to understand.
I am building a single library, on which several other distinct project depends on:
I need the ability to export MyLibrary (headers and .lib only) easily for use by third parties
MyLibrary1
Depends on external libraries, should be able to manage different versions of these libraries!
MyLibrary2
Depends on External Libraries fmod, glew, ...
Project 1, 2, 4, 5, 6 ...
Depends on MyLibrary1, 2, or both
Each project could need versions for multiple platforms (osx, windows ...)
I would like to know of a good way to organize this, do keep in mind that I am rather new to this - a more pedantic answer would be helpful. For example if you write something like /src, do explain what is supposed to go into it! I would be able to guess, but I wont be sure =)
////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Edit
I cant put this into a comment, so here goes:
#J.N, thanks for the extensive reply, I would like to clarify some stuff, I hope I understood what you meant properly:
root
library foo
/branches // old versions of foo
/tags // releases of foo
/trunk // current version
/build // stuff required by makefiles
/tools // scripts to launch tests ect
/data // test data needed when running
/output // binaries, .exe files
/dependencies // libraries that foo needs
/lib name
include
lib
/docs // documentation
/releases // generated archives
/sample // sample project that shows how to use foo
/source // *.h, *.cpp
program bar
/branches // old versions of bar
/tags // releases of bar
/trunk // current version
/build // stuff required by makefiles
/tools // scripts to launch tests ect
/data // test data needed when running
/output // binaries, .exe files
/dependencies // libraries that bar needs
/lib name
include
lib
/docs // documentation
/releases // generated archives
/sample // sample project that shows how to use bar
/source // *.h, *.cpp
1) Where do the *.sln files go? In /build?
2) do I need to copy foo/source into bar/dependencies/foo/include? After all, bar depends on foo
3) Where do *.dll files go? If foo has dependencies on dll files, then all programs using foo need access to the same dll files. Should this go into root/dlls?
There are several levels to your questions: how to organize a single project source tree, how to maintain the different projects together, how to maintain the dependencies of those project, how to maintain different variants of each projects and how to package them.
Please keep in mind that whatever you do, your project will eventually grow large enough to make it unadapted. It's normal to change the structure several times in the lifetime of a project. You'll get the feeling that it isn't right anymore when that will happen: it's usually when the setup is bothering you more than it helps.
1 - Maintaining the different variants of each project
Don't have variants for each project, you won't solve several variants by maintaining parralel versions or branches. Have a single source tree for every project/library that can be used for all variants. Don't manage different "OSes", manage different features. That is, have variants on things like "support posix sockets" or "support UI". That means that if a new OS come along, then you just need to choose the set of features it supports rather than starting a new version.
When specific code is needed, create an interface (abstract class in C++), and implement the behaviour with respect to it. That will isolate the problematic code and will help adding new variants in the future. Use a macro to choose the proper one at compile time.
2 - Maintaining the dependencies of each project
Have a specific "dependencies" folder in which each subfolder contains everything needed for one dependency (that is includes and sub dependencies). At the beggining when the codebase is not too large, you don't care too much about ensuring automatically that all the dependencies are compatible with each other, save it for later.
Don't try to merge the dependencies from their root location higher in the svn hierarchy. Formally deliver each new version to the teams needing it, up to them to update their own part of the SVN with it.
Don't try to use several versions of the same dependency at once. That will end badly. If you really need to (but try avoiding it as much as you can), branch your project for each version.
3 - Maintain the different projects
I'd advise to maintain each projects repository independently (with SVN they still could be the same repo, but in separated folders). Branches and tags should be specific to one project, not all of them. Try to limit to a maximum the number of branches, they don't solve problems (even with git). Use branches when you have to maintain different chronoligical versions in parallel (not variants) and fight back as much as you can before you actually do it, everybody will benefit from the use of the newer code.
That will allow to impose security restrictions (not sure if feasible with vanilla SVN, but there are some freely available servers that support it).
I'd recommend sending emails notifications whenever someone commits on a project to everybody potentially interested.
4 - Project source tree organization
Each project should have the following SVN structures:
trunk (current version)
branches (older versions, still in use)
tags (releases, used to create branches without thinking too much when patches are required)
When the project gets bigger, organize branches and tags in sub folders (for instance branches/V1.0/V1.1 and branches/V2.0/V2.1).
Have a root folder with the following subfolders: (some of this may be created by VC itself)
Build system (stuff required by your makefiles or others)
Tools (if any, like an XSLT tool or SOAP compiler, scripts to launch the tests)
Data (test data you need while running)
Output (where the build system put the binaries)
Temp Output (temporary files created by the compilation, optional)
Dependencies
Docs (if any ;) or generated docs)
Releases (the generated archives see later)
Sample (a small project that demonstrate how to use the project / library)
Source ( I don't like to split headers and .cpp, but that's my way )
Avoid too many levels of subfolders, it's hard to search trees, lists are easier
Define properly the build order of each folder (less necessary for VC but still)
I make my namespaces match my folders names (old Java habits, but works)
Clearly define the "public" part that you need to export
If the project is large enough to hold several binaries / dlls each should have its own folder
Don't commit any binaries you generate, only the releases. Binaries like to conflict with each other and cause pain to the other people in the team.
5 - Packaging the projects
First, make sure to include a text file with the SVN revision and the date, there's an automated way to do that with auto props.
You should have a script to generate releases (if time allows). It will check that everything is commited, generate a new version number .... Create a zip/tar.gz archive you must commit/archive, whose name contains the SVN revision, branch and the current date (the format should be normalized accross projects). The archive should have everything that is needed to run the app / use the library in a file structure. Create a tag so that you can start from it for emergency bug fixing.

What if your make (or build) target is the existence of certain contents in a file

The typical build targets for make and other build systems is a file or directory. I am building a system similar BSD-ports for emacs packages.
I just realized that my target for each package was not truly accurate - it's not that the file edan.el needs to be newer than its prerequisites. It must be newer than the prerequisites and it must contain
(provide '[% pkg %])
Where pkg is the package that has just been downloaded, unpacked, and byte-compiled.
Is there a way to do this with make? Does any other build system handle this sort of thing?
It's always the case that you want the target to contain the right data. You don't want foo.o to have no relation to the contents of foo.c, for example. The timestamps are merely a proxy for this. You have to trust that the tools actually do the right thing.
It sounds like the real complication here is that the dependencies are dynamic. The file edan.el depends on the list of all currently installed and managed packages, which can change. But you can, of course, represent this data as a file. I expect you already do have a file that has a list of the active packages, actually.
(You possibly also want to depend on the actual packages too. You should be able to write a script to munge this data into a depend file that is include in the main Makefile.)

Automatic build ID

We're looking for a way to include some sort of build ID automatically in our builds. This needs to be portable (VC++, g++ on Linux and Mac) and automatic. VC++ is what matters most, since in the other environments we use custom Python build scripts so I can do whatever I want.
We use SVN, so we were looking at using the output of svnversion to write the revision to a header and include it. This has problems : if we put the file in SVN, it will appear as modified every time, but it would be a superfluous commit and in a sense generate an infinite loop of increasing revisions. If we don't put the file in SVN and just create it as a pre-build step, the sources wouldn't be complete, as they'd need the pre-build step or Makefile to generate that file.
We could also use __DATE__ but we can't guarantee the file that uses the __DATE__ (ie writes it to a log file) will be compiled if some other file is modified - except if we "touch" it, but then we'd cause the project to be always out of date. We could touch it as the pre-build step, so it would get touched only if the rest of the project is out of date, thus not causing a spurious compile, but if VC++ computes the dependencies before the pre-build step, this wouldn't work (the file with __DATE__ won't get compiled)
Any interesting ideas?
We're using the output of svnversion, written to a header file and included. We omit the file from the repository and create it in a pre-build step; this has worked quite well for us. (I'm not sure why you object to using a pre-build step?)
We're currently using a Perl script to convert svnversion's output into a header file; I later found out that TortoiseSVN includes a subwcrev command (which has also been ported to Linux) that can do much of the same thing.
If you don't like the idea of an include file not in source control that is required for a build, consider a batch file or other build step that programmatically creates a file/include and call the svnversion within your build process.
basically GENERATE the file so you don't have an unversioned and required file.
EDIT
Josh's subwcrev is probably the best idea.
Before that was implemented I wrote my own hacky tool to do the same thing - do replacement in a template file.
It could be as simple as:
% make -DBUILD_NUMBER=`svnlook youngest /path/to/repo`
I'd look at SvnRev. You can use it as a custom pre-build step in VS, or call it from a makefile, or whatever else you need to do, and it generates a header file that you can include in your other files that will give you what you need. There's good documentation on the site.
SubWCRev is another option, though the Linux port is newer, and I don't know that a Mac version exists. It's very useful on Windows for .NET (which I'm guessing isn't an issue for you, but I'm adding this for future reference), because it allows you to create a template file that can be used to generate, for example, the Properties file for a .NET assembly.
Automatic builds can typically be full, clean builds. In that case, you start in a clean directory and there would be no issue with __DATE__ in any case. Otherwise, see Paul Beckinham's idea.
Why not tie a GUID to it, almost every language has support for generating one, or if your's doesn't there are alot of algorithms for that around.
(Although, if you do use subversion, I personally like Josh's idea better!)