Linux C++ Project Source File Directory Structure - c++

I'm working on a fairly large C++ project on Linux. We are trying to come up with criteria for organizing our source file directory structure.
One thought we have is to have the directory structure reflect our architecture choices. For instance, we would have one root level for our domain classes and another for our boundary classes, and one for our domain-agnostic infrastructure classes.
So in a banking application, we might have a directory called src/domain/accounts, src/domain/customerTransactions, src/boundary/customerInputViews, etc. We might then have another directory called src/infra/collections, src/infra/threading, etc.
Also, within that structure, we'd isolate interface classes from implementation classes. We'd do that so clients of interfaces would not be dependent on the directory structure of the implementation classes.
Any thoughts?

Breaking code into independent parts sounds like a good idea. That would allow you to potentially break stuff into separate units (for autotools: you could have convenience libs for organization, and later even separate them complete into shared libs).
Of course the submodules should contain everything needed to build: headers, sources and build infrastructure (maybe only missing a top-level build definition file which gets included). This will make sure that work can be done on small units (but test the whole thing).

Related

Sharing files across applications

We have a common functionality we need to share among several applications. We already have a few internal libraries, into which we put common code with a well-defined interface. Sometimes, though, there are problems with some code (typically a single or a few .cpp files) as it doesn't fit into an existing library and it is too small to make a new one.
Our current version control system supports file sharing, so usually such files are just shared between the applications that use them. I tend to consider it a bad thing, but actually, it makes it quite clear, as you can see exactly in which applications they are used.
Now, we are moving to svn, which does not have "real" file sharing, there is this svn:externals stuff, but will it still be simple to track the places where the files are shared when using it?
We could create a "garbage" library (or folder) and put such files there temporarily, but it's always the same problem that it complicates dependency tracking (which project use this file?).
Otherwise, are there other good solutions? How does it work in your company?
Why don't you just create a folder in SVN called "Shared" and put your shared files into that? You can include the shared files into your projects from there.
Update:
Seems like you are looking for a 3rd party tool that tracks dependencies.
Subversion and dependencies
You can only find out where a file is used by looking at all repositories.

src/ folder structure in C++?

i'm coming into C++ from Java/AS3-land, and i'm used to the package-cum-folder structure for my classes. and i like it.
i understand the very basics of namespaces in c++, and i'm happy to leave it at just the basics. but, as my project gets more complex, i'd like to keep my folder structure organized in a way i can keep in my head. i.e. something similar to Java/AS3.
1) is there any reason to not have a folder structure like:
src/
model/
view/
controller/
possibly with subfolders? (this is just an MVC example, the folder structure could be whatever depending on the project's needs.) it just seems unruly to have a src/ folder with a huge pile of header and source files within.
2) if the answer to 1) could be "go ahead and do what you want", would it be unwise/unnecessary to create a namespace for each folder, similar to Java/AS3's way of creating a package for each folder? my understanding is that namespaces are not usually used like this, nested deeply and folder-related.
I've always liked the namespace for each folder. Mostly because when I have to maintain somebody else's code, the namespace helps me find where the class was originally defined.
Well named header files can also help with this though. I also wouldn't suggest going more than 2-3 namespaces, as then it just becomes obnoxious. You'll find yourself using "using namespace blah;" a lot which I always find to be a red flag for C++ code. And you can't use "using namespace" inside a header file without some severe problems occurring.
It's all completely optional though in C++.
You may want to have a look at John Lakos Large-Scale C++ Software Design. Basically, you can do that, but your packages should (as in Java) have an acyclic dependency graph. Also, it may be opportune for each package to document which headers are exported and which aren't, maybe like so:
src/
|- package1/
|- exported_symbols_1.hh
|- exported_symbols_2.hh
|- src/
|- impl_1.hh
|- impl_1.cc
|- package2/
|- sub_package_2_1/
|- exported.hh
|- src/
...
|- src/
...
Each package is only allowed to #include the top-level headers of another package, never ones in src/ directories.
Also, when you want to use Autotools in a large project and intend to distribute headers, it may prove to be prudent to call the top-level directory not src/ but by the PACKAGE_TARNAME of that project. This makes installing headers with the help of the Autotools easier.
(And, of course, the actual file names do not look as silly as illustrated above.)
There's no reason not to divide your source code into different directories; it makes sense if there are many files and clear logical groupings.
It is not necessary to create a distinct file for each small class though - in large projects, that tends to slow compilation (as the implementation files often have to include a lot of the same headers just to compile their couple dozen lines).
As well as the use of namespaces for reflecting the logical divisions in the code, the exact thresholds at which code is subdivided into further namespaces tends to be driven by some other forces, for example:
factors suggesting use of more namespaces
very volatile code (often edited, constant additional/changed identifier use, often short and/or common words)
more developers
factors reducing the need for namespaces
tight coordination by a central body
planned formal releases with thorough checks for conflicts
Namespaces can also be used as a way to allow easy switching between alternative implementations (e.g. different versions of a protocol, thread-safe versus unsafe support functions, OS-specific implementations), so sometimes catering for such needs involves use of distinct namespaces.
It can definitely be painful digging through unintuitive and/or deeply nested namespaces to reach the variables you want, and "using namespace" is less effective if you're likely to need to use several that define the same identifiers anyway, but can suit more modal code that tends to use one or the other namespace more heavily at a time.
So, you may want to consider these factors when deciding whether to put each folder's code (or some other logically distinct group) into distinct namespaces.
There's no reason not to and will really help people reading your code. Some things to watch out for:
Don't over-nest folders, this can be confusing for readers of your code.
Be consistent in the organization of your code, e.g. don't put any view code in the controllers sub-directory, or vice-versa.
Keep the layout clean.
The src/ is a common place there c/c++ programmers put their sources within the project root.
For example:
doc/ <- documentation
libs/ <- additional libraries
po/ <- gettext translations
src/ <- sources
it common to create subdirectories underneath src/ if you've got a lot of sources files but there are no limitations how to organize this substructure.
Keep in mind that a directory structure in completely optional in c++. That is no connection between c++ namespaces and the directory structure.
You can arrange your files however you like; you'll just need to adjust your build tools' include paths and source paths to match.
Giving each directory it's own namespace is overkill and probably a bad idea, as it will make for confusing code. I'd recommend one namespace per project at most, or even just one namespace per company (since presumably within your company you have the power to rename things if necessary to resolve name collisions. Namespaces' main purpose is to handle the case where two codebases under the control of two different organizations both use the same name, and you as a third party want to use them both in the same project, but don't have the ability to modify either codebase).

Linking Issue While building different binaries

Our codebase has thousands of lines and legacy code. Across time different developers have coded as per their suitability and standards. One of the wrongly implemented code is that a common header is included is declared and defined in different directories to be lined to different binaries with little difference. Ex:
dir1/xxx.h
class ABC{
public:
int init();
};
dir1/xxx.cpp
ABC::init()
Similarly
dir2 has its own copy.
The issue was that developers wanted to keep different versions - primary because they should know when the need to call source code under dir1 or dir2 which is independent of modifications to each.
Now its the hierarchy of how we are linking code in our binary is our issue. The header file in concern is conditionally compiled using same inclusive directive #ifndef .. #define .. #endif. The header files gets archived into lib1.a lib2.a and so on. Hence when we link our library and if incase we required it from lib3.a during linking we need to make sure that it linked the first:
ldd .. lib1.a lib2.a lib3.a -- so the exact header does not gets linked properly. Note that all .a have some additional interfaces compiled and linked.
Its unfortunate is that the required header contains common declaration (defines same methods but are little bit different)
How can we resolve the issue? Including Namespace would mean a lot of revamp in our codebase? Is there a better way to do that?
What would be best design for such a code base - so that later onwards no developer can accidentally include these fatal signatures?
Please help
There are several approaches to sharing code between developers:
Let everyone share the same code. Make a team responsible for the shared code, and if they make changes to the shared code, make sure that these changes (e.g. an extra argument to a method) are 'propagated' to all the applications using the shared code.
Alternatively, you can make 'everybody' responsible for the shraed code, but even then, if the shared code is changed, the developer that did the the change should propagate this to all other applications.
In this approach you can still choose to distribute the shared code as a LIB, or as a DLL.
Give everyone their own copy of the shared code. At the same time, make a 'central version' of the shared code, and make this 'central version' the 'trunk'. This means, whenever the shared code needs to be changed, it is this 'central version' that is changed. All the local copies of the shared code in the applications are not changed.
Additionally, assign the task of 'Integration Manager" to a member of every application team. He/She will be responsible for bringing in new versions of the shared code, from the central version to the local copy. He/She will have to make changes in the application to make sure that the application still works with the new copy, and make sure the application is re-tested with the new shared code version.
If at all possible, you really need to rethink the basic practice, and undo it if you can. If this same basic header and class (or set of classes) is used in all the different library projects, even with minor variations, there ought to be some way to harmonize those variations into a proper class hierarchy such that a single library, with subclasses implementing slightly different variations as necessary, replaces multiple copies.

C++ internal code reuse: compile everything or share the library / dynamic library?

General question:
For unmanaged C++, what's better for internal code sharing?
Reuse code by sharing the actual source code? OR
Reuse code by sharing the library / dynamic library (+ all the header files)
Whichever it is: what's your strategy for reducing duplicate code (copy-paste syndrome), code bloat?
Specific example:
Here's how we share the code in my organization:
We reuse code by sharing the actual source code.
We develop on Windows using VS2008, though our project actually needs to be cross-platform. We have many projects (.vcproj) committed to the repository; some might have its own repository, some might be part of a repository. For each deliverable solution (.sln) (e.g. something that we deliver to the customer), it will svn:externals all the necessary projects (.vcproj) from the repository to assemble the "final" product.
This works fine, but I'm quite worried about eventually the code size for each solution could get quite huge (right now our total code size is about 75K SLOC).
Also one thing to note is that we prevent all transitive dependency. That is, each project (.vcproj) that is not an actual solution (.sln) is not allowed to svn:externals any other project even if it depends on it. This is because you could have 2 projects (.vcproj) that might depend on the same library (i.e. Boost) or project (.vcproj), thus when you svn:externals both projects into a single solution, svn:externals will do it twice. So we carefully document all dependencies for each project, and it's up to guy that creates the solution (.sln) to ensure all dependencies (including transitive) are svn:externals as part of the solution.
If we reuse code by using .lib , .dll instead, this would obviously reduce the code size for each solution, as well as eliminiate the transitive dependency mentioned above where applicable (exceptions are, for example, third-party library/framework that use dll like Intel TBB and the default Qt)
Addendum: (read if you wish)
Another motivation to share source code might be summed up best by Dr. GUI:
On top of that, what C++ makes easy is
not creation of reusable binary
components; rather, C++ makes it
relatively easy to reuse source code.
Note that most major C++ libraries are
shipped in source form, not compiled
form. It's all too often necessary to
look at that source in order to
inherit correctly from an object—and
it's all too easy (and often
necessary) to rely on implementation
details of the original library when
you reuse it. As if that isn't bad
enough, it's often tempting (or
necessary) to modify the original
source and do a private build of the
library. (How many private builds of
MFC are there? The world will never
know . . .)
Maybe this is why when you look at libraries like Intel Math Kernel library, in their "lib" folder, they have "vc7", "vc8", "vc9" for each of the Visual Studio version. Scary stuff.
Or how about this assertion:
C++ is notoriously non-accommodating
when it comes to plugins. C++ is
extremely platform-specific and
compiler-specific. The C++ standard
doesn't specify an Application Binary
Interface (ABI), which means that C++
libraries from different compilers or
even different versions of the same
compiler are incompatible. Add to that
the fact that C++ has no concept of
dynamic loading and each platform
provide its own solution (incompatible
with others) and you get the picture.
What's your thoughts on the above assertion? Does something like Java or .NET face these kinds of problems? e.g. if I produce a JAR file from Netbeans, will it work if I import it into IntelliJ as long as I ensure that both have compatible JRE/JDK?
People seem to think that C specifies an ABI. It doesn't, and I'm not aware of any standardised compiled language that does. To answer your main question, use of libraries is of course the way to go - I can't imagine doing anything else.
One good reason to share the source code: Templates are one of C++'s best features because they are an elegant way around the rigidity of static typing, but by their nature are a source-level construct. If you focus on binary-level interfaces instead of source-level interfaces, your use of templates will be limited.
We do the same. Trying to use binaries can be a real problem if you need to use shared code on different platforms, build environments, or even if you need different build options such as static vs. dynamic linking to the C runtime, different structure packing settings, etc..
I typically set projects up to build as much from source on-demand as possible, even with third-party code such as zlib and libpng. For those things that must be built separately, e.g. Boost, I typically have to build 4 or 8 different sets of binaries for the various combinations of settings needed (debug/release, VS7.1/VS9, static/dynamic), and manage the binaries along with the debugging information files in source control.
Of course, if everyone sharing your code is using the same tools on the same platform with the same options, then it's a different story.
I never saw shared libraries as a way to reuse code from an old project into a new one. I always thought it was more about sharing a library between different applications that you're developing at about the same time, to minimize bloat.
As far as copy-paste syndrome goes, if I copy and paste it in more than a couple places, it needs to be its own function. That's independent of whether the library is shared or not.
When we reuse code from an old project, we always bring it in as source. There's always something that needs tweaking, and its usually safer to tweak a project-specific version than to tweak a shared version that can wind up breaking the previous project. Going back and fixing the previous project is out of the question because 1) it worked (and shipped) already, 2) it's no longer funded, and 3) the test hardware needed may no longer be available.
For example, we had a communication library that had an API for sending a "message", a block of data with a message ID, over a socket, pipe, whatever:
void Foo:Send(unsigned messageID, const void* buffer, size_t bufSize);
But in a later project, we needed an optimization: the message needed to consist of several blocks of data in different parts of memory concatenated together, and we couldn't (and didn't want to, anyway) do the pointer math to create the data in its "assembled" form in the first place, and the process of copying the parts together into a unified buffer was taking too long. So we added a new API:
void Foo:SendMultiple(unsigned messageID, const void** buffer, size_t* bufSize);
Which would assemble the buffers into a message and send it. (The base class's method allocated a temporary buffer, copied the parts together, and called Foo::Send(); subclasses could use this as a default or override it with their own, e.g. the class that sent the message on a socket would just call send() for each buffer, eliminating a lot of copies.)
Now, by doing this, we have the option of backporting (copying, really) the changes to the older version, but we're not required to backport. This gives the managers flexibility, based on the time and funding constraints they have.
EDIT: After reading Neil's comment, I thought of something that we do that I need to clarify.
In our code, we do lots of "libraries". LOTS of them. One big program I wrote had something like 50 of them. Because, for us and with our build setup, they're easy.
We use a tool that auto-generates makefiles on the fly, taking care of dependencies and almost everything. If there's anything strange that needs to be done, we write a file with the exceptions, usually just a few lines.
It works like this: The tool finds everything in the directory that looks like a source file, generates dependencies if the file changed, and spits out the needed rules. Then it makes a rule to take eveything and ar/ranlib it into a libxxx.a file, named after the directory. All the objects and library are put in a subdirectory that is named after the target platform (this makes cross-compilation easy to support). This process is then repeated for every subdirectory (except the object file subdirs). Then the top-level directory gets linked with all the subdirs' libraries into the executable, and a symlink is created, again, naked after the top-level directory.
So directories are libraries. To use a library in a program, make a symbolic link to it. Painless. Ergo, everything's partitioned into libraries from the outset. If you want a shared lib, you put a ".so" suffix on the directory name.
To pull in a library from another project, I just use a Subversion external to fetch the needed directories. The symlinks are relative, so as long as I don't leave something behind it still works. When we ship, we lock the external reference to a specific revision of the parent.
If we need to add functionality to a library, we can do one of several things. We can revise the parent (if it's still an active project and thus testable), tell Subversion to use the newer revision and fix any bugs that pop up. Or we can just clone the code, replacing the external link, if messing with the parent is too risky. Either way, it still looks like a "library" to us, but I'm not sure that it matches the spirit of a library.
We're in the process of moving to Mercurial, which has no "externals" mechanism so we have to either clone the libraries in the first place, use rsync to keep the code synced between the different repositories, or force a common directory structure so you can have hg pull from multiple parents. The last option seems to be working pretty well.

How do you "refactor" ant build.xml files?

I'm working on a large C++ system built with ant+cpptasks. It works well enough, but the build.xml file is getting out of hand, due to standard operating procedure for adding a new library or executable target being to copy-and-paste another lib/exe's rules (which are already quite large). If this was "proper code", it'd be screaming out for refactoring, but being an ant newbie (more used to make or VisualStudio solutions) I'm not sure what the options are.
What are ant users' best-practices for stopping ant build files exploding ?
One obvious option would be to produce the build.xml via XSLT, defining our own tags for commonly recurring patterns. Does anyone do that, or are there better ways ?
you may be interested in:
<import>
<macrodef>
<subant>
Check also this article on "ant features for big projects".
If the rules are repetitive then you can factor them into an ant macro using macrodef and reuse that macro.
If it is the sheer size of the file that is unmanageable, then you can perhaps break it into smaller files and have the main build.xml call targets within those files.
If it's neither of these, then you may want to consider using a build system. Even though I have not used Maven myself, I hear it can solve many issues of large and unmanageable build files.
Generally, if your build file is large and complex then this is a clear indication that the way you have your code layed out, in terms of folders and packages, it complex and too complicated. I find that a complex ant script is a clear smell of a poorly laid out code base.
To fix this, think about how your code is laid out. How many projects do you have? Do those projects know how to build themselves with a master build script that knows how to bundle the individual projects/apps/components together into a larger whole.
When you are refactoring code, you are looking at ways or breaking things down so that they are easier to understand--smaller methods, smaller classes, methods and classes that do one thing. You need to apply these same principles to your code base as well.
Create smaller components that are functionally cohesive and are very loosely decoupled from the rest of the code. Use a build script to build that component into a library. Do this with the rest of your code. Now create a master build script that knows how to bundle up all of your libraries and build them into your application. If you have several applications, then create build script for each app and a master one that knows how to bundle the apps into distributables.
You should be able to see and understand the layout and structure of your code base just by looking at your build scripts. If they/it is not clean and understandable then neither is your source code.
Use Antlib files. It's a very clean way to
remove copy/pasted code
define default values
If you want to see an example, you can take a look at some of the build script I'm writing for my sandbox projects.
I would try Ant-Ivy- the agile dependency manager. We have recently started using it for some of our more complex systems and it works like a charm. The advantage here is that you dont get the overhead and transition cost to maven (it uses ant targets so will work with your current set up). Here is a comparison between the two.