Version Control: multiple version hell, file synchronization - c++

I would like to know how you normally deal with this situation:
I have a set of utility functions. Say..5..10 files. And technically they are static library, cross-platform - SConscript/SConstruct plus Visual Studio project (not solution).
Those utility functions are used in multiple small projects (15+, number increases over time). Each project has a copy of a few files or of an entire library, not a link into one central place. Sometimes project uses one file, two files, some use everything. Normally, utility functions are included as a copy of every file and SConscript/SConstruct or Visual Studio Project (depending on the situation). Each project has a separate git repository. Sometimes one project is derived from other, sometimes it isn't.
You work on every one of them, in random order. There are no other people (to make things simpler)
The problem arises when while working on one project you modify those utility function files.
Because each project has a copy of file, this introduces new version, which leads to the mess when you try later (week later, for example) to guess which version has a most complete functionality (i.e. you added a function to a.cpp in one project, and added another function to a.cpp in another project, which created a version fork)
How would you handle this situation to avoid "version hell"?
One way I can think of is using symbolic links/hard links, but it isn't perfect - if you delete one central storage, it will all go to hell. And hard links won't work on dual-boot system (although symbolic links will).
It looks like what I need is something like advanced git repository, where code for the project is stored in one local repository, but is synchronized with multiple external repositories. But I'm not sure how to do it or if it is possible to do this with git.
So, what do you think?

The normal simple way would be to have the library as a project in your version control, and if there is a modification, to edit only this project.
Then, other projects that need the library can get the needed files from the library project.

It is not completely clear to me what you want but maybe git submodules might help : http://git-scm.com/docs/git-submodule

In Subversion you can use externals (it's not GIT I know, but these tips might still help). This is how it works:
Split the application specific code (\MYAPP) from the common code (\COMMON)
Remove all duplicates from the applications; they should only use the common-code
Bring in the common code in the applications by adding \COMMON as an external in \MYAPP
You will also probably have versions of your application. Also introduce versions in the common code. So your application will have the following folders in the repository:
\MYAPP\TRUNK
\MYAPP\V1
\MYAPP\V2
Similarly, add versions to the common-code, either using version numbers, like this:
\COMMON\TRUNK
\COMMON\V1
\COMMON\V2
Or using dates, like this:
\COMMON\TRUNK
\COMMON\2010JAN01
\COMMON\2010MAR28
The externals of \MYAPP\TRUNK should point to \COMMON\TRUNK, that's obvious.
Try to synchronize the versions of the common-code with the versions of the applications, so every time an application version is fixed, also the common-code is fixed, and the application version will point to the relevant common-code external.
E.g. the externals of \MYAPP\V1 may point to \COMMON\2010JAN01.
The advantage of this approach is that every developer can now extend, improve, debug the common-code. Disadvantage is that the compilation time of applications will increase as the common-code will increase.
The alternative (putting libraries in your version system) has the disadvantage that the management (extending, improving, debugging) of the common code is always done separately from the management of the applications, which may prevent developers from writing generic common code at all (and everyone starts to write their own versions of 'generic' classes).
On the other hand, if you have a clear and flexible team solely responsible for the common code, the common code will be under much better control in the last alternative.

In Configuration (general) terms, a solution is to have multiple trunks (branches):
Release
Integration
Development
Release
This trunk / branch contains software that has passed Quality Assurance and can be released to a customer. After release, all files are marked as "read-only". The are given a label to identify the files with the release number.
Periodically or on demand the testing guru's will take the latest (tip) version from the Integration trunk and submit to grueling quality tests. This is how an integration version is promoted to a release version.
Integration
This trunk contains the latest working code. It contains bug fixes and new features. The files should be labeled after each bug fix or new feature.
Code is moved into the integration branch after the bug has passed the quality testing or the new feature is fully developed (and tested). A good idea here is to label the integration version with a temporary label before integrating developer's code.
Development
These are branches made by developers for fixing bugs or developing new features. This can be a copy of all the files moved onto their local machine or only the files that need to be modified (with links to the Integration trunk for all other files).
When moving between trunks, the code must pass qualification testing, and there must be permission to move to the trunk. For example, unwarranted new features should not be put into the integration branch without authorization.
In your case, the files need to be either checked back into the Integration trunk after they have been modified OR a whole new branch or trunk if the code is too different from the previous version (such as adding new features).
I've been studying GIT and SourceSafe trying to figure out how to implement this schema. The schema is easy to implement in the bigger Configuration Management applications like PVCS and ClearCase. Looks like for GIT, duplicated repositories are necessary (one repository for each trunk). SourceSafe clearly states that it only allows one label per version, so files that have not changed will loose label information.

Related

How to check which projects are dependent on a .cpp file?

I've got a solution with many projects that are dependent on one another (large program, about ~200 projects).
Alot of these connect are compiled as static libs, and are compiled into other projects that use link time code generation.
Now, lets say i want to test something and change a single .cpp file somewhere, and i don't want to re-install the whole thing, so i just want to replace the dlls that are affected by the change.
How do i find all the dlls that were re-created and are affected by the change ?
If you're using a version control system (which you probably are), and you check in DLLs before deployment (which you possibly don't), you can ask the VCS what DLLs have changed.
Because that's probably the place in your workflow to have this intelligence: you want a compact deployment, you need to create a checkpoint each time you deploy (in this case by checking in your deployable objects).

When full build and when partial build?

Hi I am trying to find out when full build is required and when partial build is sufficient.
There are many articals but I am not able to find the specific answers.
Below are my thoughts
Full build is required when :
1.Change in build of dependent modules.
---change in build option or using optimization techniques.
2.changes in the object layout:
---Any change in the headder file, adding and deleting of new methods in class .
---Changing object size by adding or removing of variables or virtual functions.
---Data alignment changes using pragma pack.
3.Any change in global variables
Partial build is sufficient when:
1.Any change in the logic as long as it is not altering the interface specified
2.change in stack variable
In the ideal world a full build should never be necessary, because all the build tools detecting automatically if one of their dependencies have changed.
But this is true only in the ideal world. Practically build tools are written by humans and humans
make failures, so the tools may not take every possible change into account,
are lazy, so the tools may not take any change into account.
For you this means you have to have some experience with your build tools. With a good written makefile may take everything into account and you rarely have to do a full build. But in the 21st century a makefile is not really state of the art any more, and they become complex very soon. Todays development environments do a fairly good job in finding dependencies, but for larger projects, you may have dependencies which are hard to put in the concept of your development environment and you will writing a script.
So there is no real answer to your question. In practise it is good to do a full rebuild for every release, then this rebuild should be done by pressing just one button. And do a partial build for daily work, since nobody wants to wait 2 hours to see if is code is compilable or not. But even in dayly work a full rebuild is sometimes neccessary because the linker/compiler/(your choice of tool here) had not recognized even the simplest change.

TDD - Creating a new class in an empty project to make dependencies explicit as they are added

Using TDD, I'm considering creating an (throw-away) empty project as Test-harness/container for each new class I create. So that it exists in a little private bubble.
When I have a dependency and need to get something else from the wider project then I have to do some work to add it into my clean project file and I'm forced into thinking about this dependency. Assuming my class has a single responsibility then I ought not to have to do this very much.
Another benefit is an almost instant compile / test / edit cycle.
Once I'm happy with the class, I can then add it to the main project/solution.
Has anyone done anything similar before or is this crazy?
I have not done this in general, create an empty project to test a new class, although it could happen if I don't want to modify the current projects in my editor.
The advantages could be :
sure not to modify the main project, or commit by accident
dependencies are none, with certaintly
The drawbacks could be :
cost some time ...
as soon as you want to add one dependency on your main project, you instantly get all the classes in that project ... not what you want
thinking about dependencies is usual, we normally don't need an empty project to do so
some tools check your project dependencies to verify they follow a set of rules, it could be better to use of those (as that could be used not only when starting a class, but also later on).
the private bubble concept can also by found as import statements.
current development environments on current machines already give you extra-fast operations ... if not, you could do something about it (tell us more ...)
when ok, you would need to copy to your regular project your main and your test class. This can cost you time, especially as the package might not be adequate (simplest possible in your early case because your project is empty, but adequate to your regular project later).
Overall, I'm afraid this would not be a timesaver... :-(
I have been to a presentation for using Endeavour. One of the concepts they depended highly upon was decoupling as you suggest:
service in seperate solution with its own testing harness
Endeavour is in a nutshell a powerfull development environment / plugin for VS which helps archieving these things. Among a lot of other stuff it also hooks into / creates a nightly build from SourceSafe to define which dll's are building and places those in a shared folder.
When you create code which depends on an other service you don't reference the VS project but the compiled DLL in the shared folder.
By doing this a few of the drawbacks suggested by KLE are resolved:
Projects depending on your code reference the DLL instead of your project (build time win)
When your project fails to build it will not break integration; they depend upon a DLL which is still available from last working build
All classes visible - nope
Middle ground:
You REALLY have to think about dependency's, more then in 'simple' setups.
Still costs time
But ofcourse there is also a downside:
its not easy to detect circular dependency's
I am currently in the process of thinking how to archieve the benefits of this setup without the full-blown install of Endeavour because its a pretty massive product which does really much (which you won't all need).

Do internal artifacts belong in a repository?

Our team is struggling with issues around the idea of a library repository. I have used Maven, and I am familiar with Ivy. We have little doubt that for third-party jars, a common repository that is integrated into the build system has tremendous advantage.
The struggle is around how to handle artifacts that are strictly internal. We have many artifacts that use a handful of different build tools, including Maven, but essentially they are all part of one one product (one team responsible for all of it). Unfortunately, we are not currently producing a single artifact per project, but we're headed in that direction. Developers do and will check out all the projects.
We see two options:
1) Treat all artifacts even internal ones as any third-party jar. Each jar gets built and published to the repository, and other artifact projects refer to the repository for all projects.
2) Each project refers to other "sibling" projects directly. There is a "master project" that triggers the builds for all other projects with an appropriate dependency order. In the IDE (eclipse), each projects refers to it's dependent project (source) directly. The build tools look into the sibling project referencing a .jar.
It's very clear that the open-source world is moving towards the repository model. But it seems to us that their needs may be different. Most such projects are very independent and we strongly suspect users are seldom making changes across projects. There are frequent upgrades that are now easier for clients to track and be aware of.
However, it does add a burden in that you have to separately publish changes. In our case, we just want to commit to source control (something we do 20-50 times a day).
I'm aware that Maven might solve all these problems, but the team is NOT going to convert everything to Maven. Other than maven, what do you recommend (and why)?
It's not necessary to choose only one of your options. I successfully use both in combination. If a project consists of multiple modules, they are all built together, and then delivered to the repository. However, the upload only happens for "official" or "release" builds. The ongoing development builds are done at the developers' machines. You don't have to use Maven for this. Ivy would work or even a manual process. The "artifact repository" could be one of the excellent products available or just a filesystem mount point.
It's all about figuring out the appropriate component boundaries.
When you say "developers do and will check out all projects", that's fine. You never know when you might want to make a change; there's no harm in having a working copy ready. But do you really want to make every developer to build every artifact locally, even if they do not need to change it? Really, is every developer changing every single artifact?
Or maybe you just don't have a very big product, or a very big team. There's nothing wrong with developing in a single project with many sub-projects (which may themselves have sub-modules). Everybody works on everything together. A single "build all" from the top does the job. Simple, works, fast (enough). So what would you use a shared repository for in this case? Probably nothing.
Maybe you need to wait until your project/team is bigger before you see any benefit from splitting things up.
But I guarantee you this: you have some fundamental components, which are depended on (directly or indirectly) by many projects but do not themselves depend on any internal artifacts. Ideally, these components wouldn't change very much in a typical development cycle. My advice to you is twofold:
set up an internal repository since you already agree that you will benefit from doing so, if only for third-party jars.
consider separating the most fundamental project into a separate build, and delivering its artifact(s) to the repository, then having the rest of the system reference it as if it were a third-party artifact.
If a split like this works, you'll find that you're rebuilding that fundamental piece only when needed, and the build cycle for the rest of the project (for your developers) is faster by a corresponding amount: win-win. Also, you'll be able to use a different delivery schedule (if desired) for the fundamental component(s), making the changes to it more deliberate and controlled (which is as it should be for such a component). This helps facilitate growth.
If a project produces multiple jar files (and many do) I would carefully consider (case by case) whether or not each jar will ever be reused by other projects.
If yes, that jar should go into the repository as a library, because it facilitates reuse by allowing you to specify it as a dependency.
If no, it would be a so-called "multi-project" in which the whole project is built of the parts. The individual jars probably do not need to be stored individually in the repo. Just the final completed artifact.
Gradle is definitely a candidate for you: It can use Ant tasks and call Ant scripts, understands Maven pom files, and handles multi-projects well. Maven can do much of this as well, and if you have lots of patience, Ant+Ivy can.

C++ internal code reuse: compile everything or share the library / dynamic library?

General question:
For unmanaged C++, what's better for internal code sharing?
Reuse code by sharing the actual source code? OR
Reuse code by sharing the library / dynamic library (+ all the header files)
Whichever it is: what's your strategy for reducing duplicate code (copy-paste syndrome), code bloat?
Specific example:
Here's how we share the code in my organization:
We reuse code by sharing the actual source code.
We develop on Windows using VS2008, though our project actually needs to be cross-platform. We have many projects (.vcproj) committed to the repository; some might have its own repository, some might be part of a repository. For each deliverable solution (.sln) (e.g. something that we deliver to the customer), it will svn:externals all the necessary projects (.vcproj) from the repository to assemble the "final" product.
This works fine, but I'm quite worried about eventually the code size for each solution could get quite huge (right now our total code size is about 75K SLOC).
Also one thing to note is that we prevent all transitive dependency. That is, each project (.vcproj) that is not an actual solution (.sln) is not allowed to svn:externals any other project even if it depends on it. This is because you could have 2 projects (.vcproj) that might depend on the same library (i.e. Boost) or project (.vcproj), thus when you svn:externals both projects into a single solution, svn:externals will do it twice. So we carefully document all dependencies for each project, and it's up to guy that creates the solution (.sln) to ensure all dependencies (including transitive) are svn:externals as part of the solution.
If we reuse code by using .lib , .dll instead, this would obviously reduce the code size for each solution, as well as eliminiate the transitive dependency mentioned above where applicable (exceptions are, for example, third-party library/framework that use dll like Intel TBB and the default Qt)
Addendum: (read if you wish)
Another motivation to share source code might be summed up best by Dr. GUI:
On top of that, what C++ makes easy is
not creation of reusable binary
components; rather, C++ makes it
relatively easy to reuse source code.
Note that most major C++ libraries are
shipped in source form, not compiled
form. It's all too often necessary to
look at that source in order to
inherit correctly from an object—and
it's all too easy (and often
necessary) to rely on implementation
details of the original library when
you reuse it. As if that isn't bad
enough, it's often tempting (or
necessary) to modify the original
source and do a private build of the
library. (How many private builds of
MFC are there? The world will never
know . . .)
Maybe this is why when you look at libraries like Intel Math Kernel library, in their "lib" folder, they have "vc7", "vc8", "vc9" for each of the Visual Studio version. Scary stuff.
Or how about this assertion:
C++ is notoriously non-accommodating
when it comes to plugins. C++ is
extremely platform-specific and
compiler-specific. The C++ standard
doesn't specify an Application Binary
Interface (ABI), which means that C++
libraries from different compilers or
even different versions of the same
compiler are incompatible. Add to that
the fact that C++ has no concept of
dynamic loading and each platform
provide its own solution (incompatible
with others) and you get the picture.
What's your thoughts on the above assertion? Does something like Java or .NET face these kinds of problems? e.g. if I produce a JAR file from Netbeans, will it work if I import it into IntelliJ as long as I ensure that both have compatible JRE/JDK?
People seem to think that C specifies an ABI. It doesn't, and I'm not aware of any standardised compiled language that does. To answer your main question, use of libraries is of course the way to go - I can't imagine doing anything else.
One good reason to share the source code: Templates are one of C++'s best features because they are an elegant way around the rigidity of static typing, but by their nature are a source-level construct. If you focus on binary-level interfaces instead of source-level interfaces, your use of templates will be limited.
We do the same. Trying to use binaries can be a real problem if you need to use shared code on different platforms, build environments, or even if you need different build options such as static vs. dynamic linking to the C runtime, different structure packing settings, etc..
I typically set projects up to build as much from source on-demand as possible, even with third-party code such as zlib and libpng. For those things that must be built separately, e.g. Boost, I typically have to build 4 or 8 different sets of binaries for the various combinations of settings needed (debug/release, VS7.1/VS9, static/dynamic), and manage the binaries along with the debugging information files in source control.
Of course, if everyone sharing your code is using the same tools on the same platform with the same options, then it's a different story.
I never saw shared libraries as a way to reuse code from an old project into a new one. I always thought it was more about sharing a library between different applications that you're developing at about the same time, to minimize bloat.
As far as copy-paste syndrome goes, if I copy and paste it in more than a couple places, it needs to be its own function. That's independent of whether the library is shared or not.
When we reuse code from an old project, we always bring it in as source. There's always something that needs tweaking, and its usually safer to tweak a project-specific version than to tweak a shared version that can wind up breaking the previous project. Going back and fixing the previous project is out of the question because 1) it worked (and shipped) already, 2) it's no longer funded, and 3) the test hardware needed may no longer be available.
For example, we had a communication library that had an API for sending a "message", a block of data with a message ID, over a socket, pipe, whatever:
void Foo:Send(unsigned messageID, const void* buffer, size_t bufSize);
But in a later project, we needed an optimization: the message needed to consist of several blocks of data in different parts of memory concatenated together, and we couldn't (and didn't want to, anyway) do the pointer math to create the data in its "assembled" form in the first place, and the process of copying the parts together into a unified buffer was taking too long. So we added a new API:
void Foo:SendMultiple(unsigned messageID, const void** buffer, size_t* bufSize);
Which would assemble the buffers into a message and send it. (The base class's method allocated a temporary buffer, copied the parts together, and called Foo::Send(); subclasses could use this as a default or override it with their own, e.g. the class that sent the message on a socket would just call send() for each buffer, eliminating a lot of copies.)
Now, by doing this, we have the option of backporting (copying, really) the changes to the older version, but we're not required to backport. This gives the managers flexibility, based on the time and funding constraints they have.
EDIT: After reading Neil's comment, I thought of something that we do that I need to clarify.
In our code, we do lots of "libraries". LOTS of them. One big program I wrote had something like 50 of them. Because, for us and with our build setup, they're easy.
We use a tool that auto-generates makefiles on the fly, taking care of dependencies and almost everything. If there's anything strange that needs to be done, we write a file with the exceptions, usually just a few lines.
It works like this: The tool finds everything in the directory that looks like a source file, generates dependencies if the file changed, and spits out the needed rules. Then it makes a rule to take eveything and ar/ranlib it into a libxxx.a file, named after the directory. All the objects and library are put in a subdirectory that is named after the target platform (this makes cross-compilation easy to support). This process is then repeated for every subdirectory (except the object file subdirs). Then the top-level directory gets linked with all the subdirs' libraries into the executable, and a symlink is created, again, naked after the top-level directory.
So directories are libraries. To use a library in a program, make a symbolic link to it. Painless. Ergo, everything's partitioned into libraries from the outset. If you want a shared lib, you put a ".so" suffix on the directory name.
To pull in a library from another project, I just use a Subversion external to fetch the needed directories. The symlinks are relative, so as long as I don't leave something behind it still works. When we ship, we lock the external reference to a specific revision of the parent.
If we need to add functionality to a library, we can do one of several things. We can revise the parent (if it's still an active project and thus testable), tell Subversion to use the newer revision and fix any bugs that pop up. Or we can just clone the code, replacing the external link, if messing with the parent is too risky. Either way, it still looks like a "library" to us, but I'm not sure that it matches the spirit of a library.
We're in the process of moving to Mercurial, which has no "externals" mechanism so we have to either clone the libraries in the first place, use rsync to keep the code synced between the different repositories, or force a common directory structure so you can have hg pull from multiple parents. The last option seems to be working pretty well.