How to check which projects are dependent on a .cpp file? - c++

I've got a solution with many projects that are dependent on one another (large program, about ~200 projects).
Alot of these connect are compiled as static libs, and are compiled into other projects that use link time code generation.
Now, lets say i want to test something and change a single .cpp file somewhere, and i don't want to re-install the whole thing, so i just want to replace the dlls that are affected by the change.
How do i find all the dlls that were re-created and are affected by the change ?

If you're using a version control system (which you probably are), and you check in DLLs before deployment (which you possibly don't), you can ask the VCS what DLLs have changed.
Because that's probably the place in your workflow to have this intelligence: you want a compact deployment, you need to create a checkpoint each time you deploy (in this case by checking in your deployable objects).

Related

Do internal artifacts belong in a repository?

Our team is struggling with issues around the idea of a library repository. I have used Maven, and I am familiar with Ivy. We have little doubt that for third-party jars, a common repository that is integrated into the build system has tremendous advantage.
The struggle is around how to handle artifacts that are strictly internal. We have many artifacts that use a handful of different build tools, including Maven, but essentially they are all part of one one product (one team responsible for all of it). Unfortunately, we are not currently producing a single artifact per project, but we're headed in that direction. Developers do and will check out all the projects.
We see two options:
1) Treat all artifacts even internal ones as any third-party jar. Each jar gets built and published to the repository, and other artifact projects refer to the repository for all projects.
2) Each project refers to other "sibling" projects directly. There is a "master project" that triggers the builds for all other projects with an appropriate dependency order. In the IDE (eclipse), each projects refers to it's dependent project (source) directly. The build tools look into the sibling project referencing a .jar.
It's very clear that the open-source world is moving towards the repository model. But it seems to us that their needs may be different. Most such projects are very independent and we strongly suspect users are seldom making changes across projects. There are frequent upgrades that are now easier for clients to track and be aware of.
However, it does add a burden in that you have to separately publish changes. In our case, we just want to commit to source control (something we do 20-50 times a day).
I'm aware that Maven might solve all these problems, but the team is NOT going to convert everything to Maven. Other than maven, what do you recommend (and why)?
It's not necessary to choose only one of your options. I successfully use both in combination. If a project consists of multiple modules, they are all built together, and then delivered to the repository. However, the upload only happens for "official" or "release" builds. The ongoing development builds are done at the developers' machines. You don't have to use Maven for this. Ivy would work or even a manual process. The "artifact repository" could be one of the excellent products available or just a filesystem mount point.
It's all about figuring out the appropriate component boundaries.
When you say "developers do and will check out all projects", that's fine. You never know when you might want to make a change; there's no harm in having a working copy ready. But do you really want to make every developer to build every artifact locally, even if they do not need to change it? Really, is every developer changing every single artifact?
Or maybe you just don't have a very big product, or a very big team. There's nothing wrong with developing in a single project with many sub-projects (which may themselves have sub-modules). Everybody works on everything together. A single "build all" from the top does the job. Simple, works, fast (enough). So what would you use a shared repository for in this case? Probably nothing.
Maybe you need to wait until your project/team is bigger before you see any benefit from splitting things up.
But I guarantee you this: you have some fundamental components, which are depended on (directly or indirectly) by many projects but do not themselves depend on any internal artifacts. Ideally, these components wouldn't change very much in a typical development cycle. My advice to you is twofold:
set up an internal repository since you already agree that you will benefit from doing so, if only for third-party jars.
consider separating the most fundamental project into a separate build, and delivering its artifact(s) to the repository, then having the rest of the system reference it as if it were a third-party artifact.
If a split like this works, you'll find that you're rebuilding that fundamental piece only when needed, and the build cycle for the rest of the project (for your developers) is faster by a corresponding amount: win-win. Also, you'll be able to use a different delivery schedule (if desired) for the fundamental component(s), making the changes to it more deliberate and controlled (which is as it should be for such a component). This helps facilitate growth.
If a project produces multiple jar files (and many do) I would carefully consider (case by case) whether or not each jar will ever be reused by other projects.
If yes, that jar should go into the repository as a library, because it facilitates reuse by allowing you to specify it as a dependency.
If no, it would be a so-called "multi-project" in which the whole project is built of the parts. The individual jars probably do not need to be stored individually in the repo. Just the final completed artifact.
Gradle is definitely a candidate for you: It can use Ant tasks and call Ant scripts, understands Maven pom files, and handles multi-projects well. Maven can do much of this as well, and if you have lots of patience, Ant+Ivy can.

Sharing files across applications

We have a common functionality we need to share among several applications. We already have a few internal libraries, into which we put common code with a well-defined interface. Sometimes, though, there are problems with some code (typically a single or a few .cpp files) as it doesn't fit into an existing library and it is too small to make a new one.
Our current version control system supports file sharing, so usually such files are just shared between the applications that use them. I tend to consider it a bad thing, but actually, it makes it quite clear, as you can see exactly in which applications they are used.
Now, we are moving to svn, which does not have "real" file sharing, there is this svn:externals stuff, but will it still be simple to track the places where the files are shared when using it?
We could create a "garbage" library (or folder) and put such files there temporarily, but it's always the same problem that it complicates dependency tracking (which project use this file?).
Otherwise, are there other good solutions? How does it work in your company?
Why don't you just create a folder in SVN called "Shared" and put your shared files into that? You can include the shared files into your projects from there.
Update:
Seems like you are looking for a 3rd party tool that tracks dependencies.
Subversion and dependencies
You can only find out where a file is used by looking at all repositories.

Distributing DLLs Inside an EXE (C++)

How can I include my programs dependency DLLs inside the EXE file (so I only have to distribute that one file)? I am using C++ so I can't use ILMerge like I usually do for C#, but is there an easier way to automatically do this in Visual Studio?
I know this is possible (thats why installers work), I just need some help being pointed to the best way to this.
Thank you for your time.
There are many problems with this approach. For one example, see this post from REAL Software. Their “REALbasic” product used to do this and had problems including:
When writing the DLLs out at run-time, it would trigger anti-virus warnings.
Problems with machines where the user doesn’t have write permissions or is low on disk space.
Their attempt to fix the problem caused more problems, including crashes. Eventually they relented and now distribute DLLs side-by-side with apps.
If you really need a single-EXE deployment, and can’t use an installer for some reason, the reliable way is to static-link all dependencies. This assumes that you have the correct .libs (and not just .libs that link in the DLL).
There exist two options, both of which are far from ideal:
write a temporary file somewhere
load the DLL to memory "by hand", i.e. create a memory block, put DLL image to memory, then process relocations and external references.
The downside of the first approach is described above by Nate. Second approach is possible, but is complicated (requires deep knowledge of certain low-level things) and doesn't allow the DLL code to access DLL resources (this is obvious - there's no image of the DLL so the OS doesn't know where to take resources).
One more option usable in some scenarios: create a virtual disk whose contents are stored in your EXE file resources, and load the DLL from there. This is possible using our SolFS product (OS edition), but creation of the virtual disk itself requires use of kernel-mode drivers which must be written to disk before use.
Most installers use a zip file (or something similar) to hold whatever files are needed. When you run the installer, it decompresses the data and puts the individual files where needed (and typically adds registry entries, registers any COM controls it installed, etc.)

Version Control: multiple version hell, file synchronization

I would like to know how you normally deal with this situation:
I have a set of utility functions. Say..5..10 files. And technically they are static library, cross-platform - SConscript/SConstruct plus Visual Studio project (not solution).
Those utility functions are used in multiple small projects (15+, number increases over time). Each project has a copy of a few files or of an entire library, not a link into one central place. Sometimes project uses one file, two files, some use everything. Normally, utility functions are included as a copy of every file and SConscript/SConstruct or Visual Studio Project (depending on the situation). Each project has a separate git repository. Sometimes one project is derived from other, sometimes it isn't.
You work on every one of them, in random order. There are no other people (to make things simpler)
The problem arises when while working on one project you modify those utility function files.
Because each project has a copy of file, this introduces new version, which leads to the mess when you try later (week later, for example) to guess which version has a most complete functionality (i.e. you added a function to a.cpp in one project, and added another function to a.cpp in another project, which created a version fork)
How would you handle this situation to avoid "version hell"?
One way I can think of is using symbolic links/hard links, but it isn't perfect - if you delete one central storage, it will all go to hell. And hard links won't work on dual-boot system (although symbolic links will).
It looks like what I need is something like advanced git repository, where code for the project is stored in one local repository, but is synchronized with multiple external repositories. But I'm not sure how to do it or if it is possible to do this with git.
So, what do you think?
The normal simple way would be to have the library as a project in your version control, and if there is a modification, to edit only this project.
Then, other projects that need the library can get the needed files from the library project.
It is not completely clear to me what you want but maybe git submodules might help : http://git-scm.com/docs/git-submodule
In Subversion you can use externals (it's not GIT I know, but these tips might still help). This is how it works:
Split the application specific code (\MYAPP) from the common code (\COMMON)
Remove all duplicates from the applications; they should only use the common-code
Bring in the common code in the applications by adding \COMMON as an external in \MYAPP
You will also probably have versions of your application. Also introduce versions in the common code. So your application will have the following folders in the repository:
\MYAPP\TRUNK
\MYAPP\V1
\MYAPP\V2
Similarly, add versions to the common-code, either using version numbers, like this:
\COMMON\TRUNK
\COMMON\V1
\COMMON\V2
Or using dates, like this:
\COMMON\TRUNK
\COMMON\2010JAN01
\COMMON\2010MAR28
The externals of \MYAPP\TRUNK should point to \COMMON\TRUNK, that's obvious.
Try to synchronize the versions of the common-code with the versions of the applications, so every time an application version is fixed, also the common-code is fixed, and the application version will point to the relevant common-code external.
E.g. the externals of \MYAPP\V1 may point to \COMMON\2010JAN01.
The advantage of this approach is that every developer can now extend, improve, debug the common-code. Disadvantage is that the compilation time of applications will increase as the common-code will increase.
The alternative (putting libraries in your version system) has the disadvantage that the management (extending, improving, debugging) of the common code is always done separately from the management of the applications, which may prevent developers from writing generic common code at all (and everyone starts to write their own versions of 'generic' classes).
On the other hand, if you have a clear and flexible team solely responsible for the common code, the common code will be under much better control in the last alternative.
In Configuration (general) terms, a solution is to have multiple trunks (branches):
Release
Integration
Development
Release
This trunk / branch contains software that has passed Quality Assurance and can be released to a customer. After release, all files are marked as "read-only". The are given a label to identify the files with the release number.
Periodically or on demand the testing guru's will take the latest (tip) version from the Integration trunk and submit to grueling quality tests. This is how an integration version is promoted to a release version.
Integration
This trunk contains the latest working code. It contains bug fixes and new features. The files should be labeled after each bug fix or new feature.
Code is moved into the integration branch after the bug has passed the quality testing or the new feature is fully developed (and tested). A good idea here is to label the integration version with a temporary label before integrating developer's code.
Development
These are branches made by developers for fixing bugs or developing new features. This can be a copy of all the files moved onto their local machine or only the files that need to be modified (with links to the Integration trunk for all other files).
When moving between trunks, the code must pass qualification testing, and there must be permission to move to the trunk. For example, unwarranted new features should not be put into the integration branch without authorization.
In your case, the files need to be either checked back into the Integration trunk after they have been modified OR a whole new branch or trunk if the code is too different from the previous version (such as adding new features).
I've been studying GIT and SourceSafe trying to figure out how to implement this schema. The schema is easy to implement in the bigger Configuration Management applications like PVCS and ClearCase. Looks like for GIT, duplicated repositories are necessary (one repository for each trunk). SourceSafe clearly states that it only allows one label per version, so files that have not changed will loose label information.

C++ internal code reuse: compile everything or share the library / dynamic library?

General question:
For unmanaged C++, what's better for internal code sharing?
Reuse code by sharing the actual source code? OR
Reuse code by sharing the library / dynamic library (+ all the header files)
Whichever it is: what's your strategy for reducing duplicate code (copy-paste syndrome), code bloat?
Specific example:
Here's how we share the code in my organization:
We reuse code by sharing the actual source code.
We develop on Windows using VS2008, though our project actually needs to be cross-platform. We have many projects (.vcproj) committed to the repository; some might have its own repository, some might be part of a repository. For each deliverable solution (.sln) (e.g. something that we deliver to the customer), it will svn:externals all the necessary projects (.vcproj) from the repository to assemble the "final" product.
This works fine, but I'm quite worried about eventually the code size for each solution could get quite huge (right now our total code size is about 75K SLOC).
Also one thing to note is that we prevent all transitive dependency. That is, each project (.vcproj) that is not an actual solution (.sln) is not allowed to svn:externals any other project even if it depends on it. This is because you could have 2 projects (.vcproj) that might depend on the same library (i.e. Boost) or project (.vcproj), thus when you svn:externals both projects into a single solution, svn:externals will do it twice. So we carefully document all dependencies for each project, and it's up to guy that creates the solution (.sln) to ensure all dependencies (including transitive) are svn:externals as part of the solution.
If we reuse code by using .lib , .dll instead, this would obviously reduce the code size for each solution, as well as eliminiate the transitive dependency mentioned above where applicable (exceptions are, for example, third-party library/framework that use dll like Intel TBB and the default Qt)
Addendum: (read if you wish)
Another motivation to share source code might be summed up best by Dr. GUI:
On top of that, what C++ makes easy is
not creation of reusable binary
components; rather, C++ makes it
relatively easy to reuse source code.
Note that most major C++ libraries are
shipped in source form, not compiled
form. It's all too often necessary to
look at that source in order to
inherit correctly from an object—and
it's all too easy (and often
necessary) to rely on implementation
details of the original library when
you reuse it. As if that isn't bad
enough, it's often tempting (or
necessary) to modify the original
source and do a private build of the
library. (How many private builds of
MFC are there? The world will never
know . . .)
Maybe this is why when you look at libraries like Intel Math Kernel library, in their "lib" folder, they have "vc7", "vc8", "vc9" for each of the Visual Studio version. Scary stuff.
Or how about this assertion:
C++ is notoriously non-accommodating
when it comes to plugins. C++ is
extremely platform-specific and
compiler-specific. The C++ standard
doesn't specify an Application Binary
Interface (ABI), which means that C++
libraries from different compilers or
even different versions of the same
compiler are incompatible. Add to that
the fact that C++ has no concept of
dynamic loading and each platform
provide its own solution (incompatible
with others) and you get the picture.
What's your thoughts on the above assertion? Does something like Java or .NET face these kinds of problems? e.g. if I produce a JAR file from Netbeans, will it work if I import it into IntelliJ as long as I ensure that both have compatible JRE/JDK?
People seem to think that C specifies an ABI. It doesn't, and I'm not aware of any standardised compiled language that does. To answer your main question, use of libraries is of course the way to go - I can't imagine doing anything else.
One good reason to share the source code: Templates are one of C++'s best features because they are an elegant way around the rigidity of static typing, but by their nature are a source-level construct. If you focus on binary-level interfaces instead of source-level interfaces, your use of templates will be limited.
We do the same. Trying to use binaries can be a real problem if you need to use shared code on different platforms, build environments, or even if you need different build options such as static vs. dynamic linking to the C runtime, different structure packing settings, etc..
I typically set projects up to build as much from source on-demand as possible, even with third-party code such as zlib and libpng. For those things that must be built separately, e.g. Boost, I typically have to build 4 or 8 different sets of binaries for the various combinations of settings needed (debug/release, VS7.1/VS9, static/dynamic), and manage the binaries along with the debugging information files in source control.
Of course, if everyone sharing your code is using the same tools on the same platform with the same options, then it's a different story.
I never saw shared libraries as a way to reuse code from an old project into a new one. I always thought it was more about sharing a library between different applications that you're developing at about the same time, to minimize bloat.
As far as copy-paste syndrome goes, if I copy and paste it in more than a couple places, it needs to be its own function. That's independent of whether the library is shared or not.
When we reuse code from an old project, we always bring it in as source. There's always something that needs tweaking, and its usually safer to tweak a project-specific version than to tweak a shared version that can wind up breaking the previous project. Going back and fixing the previous project is out of the question because 1) it worked (and shipped) already, 2) it's no longer funded, and 3) the test hardware needed may no longer be available.
For example, we had a communication library that had an API for sending a "message", a block of data with a message ID, over a socket, pipe, whatever:
void Foo:Send(unsigned messageID, const void* buffer, size_t bufSize);
But in a later project, we needed an optimization: the message needed to consist of several blocks of data in different parts of memory concatenated together, and we couldn't (and didn't want to, anyway) do the pointer math to create the data in its "assembled" form in the first place, and the process of copying the parts together into a unified buffer was taking too long. So we added a new API:
void Foo:SendMultiple(unsigned messageID, const void** buffer, size_t* bufSize);
Which would assemble the buffers into a message and send it. (The base class's method allocated a temporary buffer, copied the parts together, and called Foo::Send(); subclasses could use this as a default or override it with their own, e.g. the class that sent the message on a socket would just call send() for each buffer, eliminating a lot of copies.)
Now, by doing this, we have the option of backporting (copying, really) the changes to the older version, but we're not required to backport. This gives the managers flexibility, based on the time and funding constraints they have.
EDIT: After reading Neil's comment, I thought of something that we do that I need to clarify.
In our code, we do lots of "libraries". LOTS of them. One big program I wrote had something like 50 of them. Because, for us and with our build setup, they're easy.
We use a tool that auto-generates makefiles on the fly, taking care of dependencies and almost everything. If there's anything strange that needs to be done, we write a file with the exceptions, usually just a few lines.
It works like this: The tool finds everything in the directory that looks like a source file, generates dependencies if the file changed, and spits out the needed rules. Then it makes a rule to take eveything and ar/ranlib it into a libxxx.a file, named after the directory. All the objects and library are put in a subdirectory that is named after the target platform (this makes cross-compilation easy to support). This process is then repeated for every subdirectory (except the object file subdirs). Then the top-level directory gets linked with all the subdirs' libraries into the executable, and a symlink is created, again, naked after the top-level directory.
So directories are libraries. To use a library in a program, make a symbolic link to it. Painless. Ergo, everything's partitioned into libraries from the outset. If you want a shared lib, you put a ".so" suffix on the directory name.
To pull in a library from another project, I just use a Subversion external to fetch the needed directories. The symlinks are relative, so as long as I don't leave something behind it still works. When we ship, we lock the external reference to a specific revision of the parent.
If we need to add functionality to a library, we can do one of several things. We can revise the parent (if it's still an active project and thus testable), tell Subversion to use the newer revision and fix any bugs that pop up. Or we can just clone the code, replacing the external link, if messing with the parent is too risky. Either way, it still looks like a "library" to us, but I'm not sure that it matches the spirit of a library.
We're in the process of moving to Mercurial, which has no "externals" mechanism so we have to either clone the libraries in the first place, use rsync to keep the code synced between the different repositories, or force a common directory structure so you can have hg pull from multiple parents. The last option seems to be working pretty well.