How to organize sources of complex program?

How to organize sources of complex program? - c++

We're creating very complex embedded system and «sources» contains few projects of Visual C++, IAR, Code Composer Studio and Altium Designer schemes and pcbs. All of that possibly could be in few versions.
So, what practice could you advice me to arrange all that stuff?
Thank you

I have the same setup as you.
I use Altium Designer for the hardware schematics and PCB design. But I also have Firmware source files and related utilities. And I have mechanical design files.
Here's how I do it:
Project Name
Firmware
MainCpu
trunk
tags
branches
IoCpu
trunk
tags
branches
Hardware
MainPcb
trunk
tags
branches
IoPcb
trunk
tags
branches
PowerPcb
trunk
tags
branches
Mechanical
Chassis
trunk
tags
branches
Other
trunk
tags
branches
This way all the project files are stored together in the SVN repository. The only down side I've found is that you can't just check out the Project and get the latest FW/HW/MEK files. You have to check out each Head of FW/HW/MEK.
The reason for the separate sub-modules for FW/HW/MEK is that they will get separate version tags.

Everything that you consider as sources should be under a Source Control System, like SVN. This is the best way to handle versions, revisions, branches and tags. SVN can handle binary files, so you won't have problems with non-text files.

If your C++ source files are numerous and span multiple directories then the effort put into grokking Large Scale C++ Software Design by John Lakos may be very worth it. The main theme of the book is how your physical layout of the software, that is, the arrangement of source code files in directories, limit or extend your ability to modify the software.

I like to have a directory structure that at the top level reflects each of the programmable parts.(i.e. microcontroller, DSP1, FPGA1, FPGA2,...)
I also like to have a subdirectory(ies) that has all the generated files, so it is easy to make a clean source tree. Also make it easy to do a clean build straight from the source code configuration tool. (i.e. get and build from source to binary image(s) in as few steps as possible)
Also have each programmable part have it's own version number, and one version number that reflects each of the combination of the sub component version numbers.

Definitely use source control, if the program itself doesn't support it, just keep the parent folder you use under source control. SVN is my current fav.
As far as how to arrange your files, I noticed you had Altium Designer on your list, that program will a) play nice with source control, and b) arrange your files in an orderly manner, assuming you use their whole 'project' file structure. Look into using their 'PCB' (if that's what your doing) or 'embedded' projects, when you create one, it creates buckets for you to store all your different types of files into.
Even if you don't want to actually use Altium for your files, create a project and look at their directory structure to get an idea about all the files you'll need to keep track of.

(Aside from trivial helper classes) put one class in each cpp/h file, and name the cpp/h files the same as the class.
Group related classes files into folders (you can optionally use a hierarchy of namespaces that match the folder structure. The .net approach here is to use a CompanyName.ProductName namespace, with your files stored in a ProductName project/subfolder of your solution). So for example, you might group your Math, I/O, and Drawing classes into separate "subsystem" folders.
Ideally, make these separate sections into re-usable libraries (MyCompany.Math). You'll be glad of this later when you want to develop a new product that will share some of the code. In that case, the top level "folders" become separate projects in their own right, and you can start to work on minimising dependences between them to realise and then enforce a much better overall framework design in your code base.
The ideal within folders is to find a good balance between clutter and sparseness - try to balance the folders so that they have between 5-15 files in each. If fewer, consider merging the folders; if greater, consider adding sub-category folders to break down the complexity.
As long as your classes/files and namespaces/folders have good descriptive names, and your folders are logically structured, you can make an extremely large project very easy to navigate.
At the risk of starting a religious war, I prefer to put the headers and their source files in the same folder so that when you are editing a .cpp the .h is easily accessible rather than having to move up and fown by a folder all the time.

Reduce the complexity!
My first engineering professor had a famous first lecture. It consisted of a single equation written on the blackboard:
Perfection = Simplicity
The problem with Source Control Systems is that they manage complexity but also promote it.

Related

Should these auxiliary files be under Git version control?

I decided to start using the Git version control system for my C++ project. I'm new to version control. For the trunk things are simple, I just commit all the project versions I have. I kept each version as a separate folder because I knew I'd very soon use Git. But I encountered a problem with my branches.
At some stage of the development, I decided there's one class I want to develop in a branch. Without version control, I had to use make a "manual" branch. I copied the most recent header file and source file of that class to a separate folder and started working there. I made several versions there to work with simultaneously. One version was the first prototype of the class according to the plan (for which I made the "branch"). Then I added another file, in which I copied the first one but removed things that seemed to not be necessary. This way I have 2 versions, one with all my ideas and features, the the other one just with what I really use in my code, without what's not in use at the moment.
But then I added more. As development went on, I decided it may be a good idea to make that class a template. So I added a third version, which is just like the second one, but now some functionality implemented using polymorphism is implemented using a template. And I can't tell yet which version is the best, as it's too early to tell, so I want to have all 3 together.
Then I made another special file: A copy of the third version header file, in which each line can be marked or not marked. Marked means I use that specific method or I'm sure it's going to be in use very soon, otherwise the line isn't marked.
Then, some time later, I started a new branch. And for that branch I needed a new version of that class developed in the first branch. So I just copied one of the versions to the new branch's folder and started working there. Now again I had some kind of auxiliary file: I had 2 files, one from which I delete class methods I use, and one into which I write new methods I need to have.
Now I want to start using Git and I wonder: For all the project's text files, plans, diagrams, etc., it obvious - I keep them outside the Git repo. Whenever collaborative editing is needed I can set up a wiki or something like that. But for all those copies of the same header file, and for those auxiliary "marked" files, what do I do with them? I mean, it's fine by me to have them all in a branch, but what happens when I merge a branch into the trunk? I don't want to have all these copies and versions and lists, just the one final class file I've made.
On one hand, these are C++ source files used while coding. On the other hand they're not part of the pure source code of the software package, they just help me while I work but will never get compiled because in the end there's just the final version of the class which I chose to merge, and all other aux files, lists, etc. are kept just for reference.
What would be the best thing to do?
Thanks for reading my long story :)
EDIT: It's a local repo on my personal computer

Always keep documentation in same repository as source code. If you do not, your documentation will rot. It is becouse documentation is written agains some version of your software, so it has to develop the same way as software develops.
If your documentation is automaticaly generated or compiled into another format, commit only source data, makefile and configuration of generator, just like you do with source code.

What you describe is the normal use of branches: You have your master branch ("official", if it where) and a branch to develop a new feature (it doesn't really have to live in a separate directory, if I understand you correctly). Periodically you synchronize the feature branch with the master, either by rebasing it on the master or merging its changes in. In its turn, you can well have subordinate branches in which you try out approaches to develop the feature, handled with respect to the feature branch just like that one respect to the master. But in that case you have to be careful whenever you rebase.
You should keep any data that isn't easy to recreate in the repository, be it source code, documentation or even design sketches. Stuff that can be recreated (object code, automatically formated documentation, ...) should be kept out (any change there will create a difference to be checked in). Your repository (particularly not published branches) is your own workspace, it can be all the messy you like.
Take a look at the book mentioned at the git homepage.

Well, that’s clearly documentation and not source code, so you should separate it from your source code. As your documentation seems to be branch dependent, you should still check it into the repo, but in a separate doc directory.
About the merging: How a merge works is up to you in the end. Git just has a default merge strategy which is what most people want most of the time. But if you say that a merge into the main branch should just bring the code and not the docu, then that’s fine. Just merge that way:
git merge mybranch --no-commit
rm -rf **docu-dir**
git add -A
git commit

Visualizing a huge C++ project using Doxygen + Graphviz

I've inherited a large C++ project which I need to port to Linux. There are over 200,000 lines of source in this project spread across more than 300 files. It would be tremendously helpful to have a visual dependency/include tree to refer to for this project so that I can get a general feel for the application's internal structure. This would also help me to locate the "fault lines" between the core modules and Windows header files so that I can stub them out later.
The class viewer in Visual Studio simply isn't cutting it. I was reading around, and learned that Doxygen is a commonly used tool for listing dependencies. I'm much more of a visual person, and found that this wasn't so helpful. Fortunately, I learned about the Graphviz plugin, using something called "Dot" that has enabled me to generate dependency trees for parts. Unfortunately, hundreds of smaller dependency trees are generated for specific files, rather than having one large one as I'd hoped for. Here are a couple of examples:
As you can see (I hope), Doxygen/GraphViz seem to give up when the graph gets too large and gray out the child nodes. I then have to go to the graph for that specific node if I want to see what's further down the tree. Not only does this limit the visual helpfulness of the graph, but if the child node depends on any of the nodes from the original graph, these nodes will be shown again. This is leading to lots of duplicate connections that make it very hard to conceptually isolate the graph from any given file. As a result, I feel like I'm "zoomed in" and still can't see the whole picture.
I've tried playing around with the DOT_GRAPH_MAX_NODES setting in the Expert view in Doxygen, but this doesn't seem to affect the scope of the graphs that are being generated. From the output generated from any given run, it seems like Doxygen itself is generating hundreds of graph files, and Graphviz is just faithfully generating graphs for each one. Is there any known way to make Doxygen generate one large graph file instead of hundreds of smaller ones?
Alternatively, are there any free visual graphing solutions out there which know how to handle complicated C++ project files with nested pre-processor directives, MIDL interfaces, and manually defined include paths the way Doxygen does?
My searches are finding general graphing utilities (or questions about them), but nothing specific to large C++ projects. Surely with all the coding that's been done over the years somebody must have such a tool!
Thanks,
-Alex

You can use the XML files generated by doxygen, and merge them into a single giant dot-format graph file (using xml stylesheet or similar), then run graphviz on it.
Doxygen automatically invoking graphviz is most useful when the number of graphs is high. For a single graph, automatically creating the content is important, but automatically calling dot, not so much.

Avoiding unneccessry recompilations using "branchy" development model

I'm using Mercurial for development of quite a large C++ project which takes about 30 minutes to get built from the scratch(while incremental builds are very quick).
I'm usually trying to implement each new feature in the new branch(using "hg clone") and I may have several new features developed during the day and it's quickly getting very boring to wait for the new feature branch to get built.
Are there any recipes to somehow re-use object files from other already built branches?
P.S. in git there are named branches within the same repository which make re-usage of the existing object files possible for the build system, however I prefer the simpler Mercurial separate branches model...

I suggest using ccache as a way to speed up compilation of (mostly) the same code tree. The way it works is as following:
You define a place to be used as the cache (and the maximum cache size) by using the CCACHE_DIR environment variable
Your compiler should be set to ccache ${CC} or ccache ${CXX}
ccache takes the output of ${CC} -E and the compilation flags and uses that as a base for its hash. As long as the compiler flags, source file and the headers are all unchanged, the object file will be taken from cache, saving valuable compilation time.
Note that this method speeds up compilation of any source file that eventually produces the same hash. If you share source files across projects, ccache will handle them as well.
If you already use distcc and wish to use it with ccache, set the CCACHE_PREFIX environment variable to distcc.
Using ccache sped up our source tree compilation around tenfold.

A simple way to speed up your builds could be to use a local "build directory" on your disk. This way you can checkout into this directory and start the build. The first time it will take the full time, but after that it will (hopefully) only rebuild the files where the source code changed.

My Localbranch extension was designed partly around this use case. It uses a single working directory, but I think it's simpler than git. It's essentially a mechanism for maintaining multiple repository clones under one working directory, where only one is active at a given time.

Woops, I missed your P.S. where you don't like having multiple named branches in the same repo and that you prefer separate clones.. sorry about that.
I too have somewhat large C++ projects and the clone-per-feature workflow didn't work for me very well. Firstly, I had to close down my Vim session and then reopen (many of the same) files once I've created the clone. Secondly, like you said, a lot of code must be recompiled unnecessarily. Thirdly, I have to keep track of where I've pushed to and pulled from - gets confusing when you start a new feature and then get sidetracked onto a new one. Before you know it you have many clones and not sure which ones need to be pushed back to your main.
You definitely don't want to use named branches (as I'm sure you know) to handle this as they are quite permanent.
What you need are bookmarks: https://www.mercurial-scm.org/wiki/BookmarksExtension
Bookmarks allow you to create lightweight (and otherwise anonymous) branches per feature by facilitating the naming of heads in your repo. These heads would normally be unnamed and you would have to look at the output of 'hg log' or use some graphical tool to find the revision numbers for the tip of your feature-branch. With bookmarks you can name them descriptive names like 'my-cool-feature' or 'bugfix-392'.
If you like the idea of bookmarks, I'd also recommend my own extension called 'tasks': http://bitbucket.org/alu/hgtasks. This extension works like bookmarks but adds some more functionality. It allows you created feature-branches (now called tasks) and suppress the pushing of incomplete tasks. This is handy when you have a few feature-branches at once. You may not be ready to push your 'my-cool-feature' task, but 'bugfix-392' is ready to go. Because tasks track a set of changesets (and not just one 'tip' changeset) there are some things you can do with tasks that you can't with bookmarks. See an example workflow here: http://x.zpuppet.org/2009/03/09/mercurial-tasks-extension/.

Mercurial also has local named branches, see the hg branch command.
If you insist on using hg clone to do branchy development, I guess you could try creating a folder link (shortcut under windows) in your repo to a shared obj folder. This will work with hg clone, but I'm not sure your build tool will pick it up.
Otherwise, you probably keep all your repos in one folder - just put your obj folder there (it shouldn't be under source control anyways, imo). Use relative paths to refer to it.

A word of warning: many .o symbol tables (or equivalent) contain the full path name of the source file. If that other file changes (or if the path is not visible from the new directory) you may encounter weirdness when debugging.

C++ Header files - put them in one directory or merged in a tree structure?

I have a substantial body of source code (OOFILE) which I'm finally putting up on Sourceforge. I need to decide if I should go with a monolithic include directory or keep the header files with the source tree.
I want to make this decision before pushing to the svn repo on SourceForge. I expect a lot of people who use it after that move will keep a working copy checked out directly from SF so won't want to change their structure.
The full source tree has about 262 files in 25 folders. There are a lot more classes than that suggests as due to conforming to 8.3 character names (yes it dates back to Win3.1) many classes are in one file. As I used to develop with ObjectMaster, that never bothered me but I will be splitting it up to conform to more recent trends to minimise the number of classes per file. From a quick skim of the class list, there are about 600 classes.
OOFILE is a cross-platform product expected to be built on Mac, Windows and assorted Unix platforms. As it started life on Mac, with compilers that point to include trees rather than flat include dirs, headers were kept with the source.
Later, mainly to keep some Visual Studio users happy, a build was reorganised with a single include directory. I'm trying to choose between those models.
The entire OOFILE product covers quite a few domains:
database front-end
range of database backends
simple 2D graphing engine for Mac and Windows
simple character-mode report-writer for trivial html and text listing
very rich banding report-writer with Mac and Windows Preview and Printing and cross-platform generation of text, RTF, HTML and XML reports
forms integration engine for easy CRUD forms binding to the database, with implementations on PowerPlant and MFC
cross-platform utility classes
file and directory manipulation
strings
arrays
XML and tag generation
Many people only want to use it on a single platform and some of those code areas are pure legacy (eg: PowerPlant UI framework on classic Mac). It therefore seems people would appreciate not having headers from those unwanted areas dumped in their monolithic include directory.
I started thinking about having an include directory split up into a few of the domains above and then realised that was sounding more like the original structure.
In summary, the choices seem to be:
Keep original model, all headers adjacent to source - max flexibility at cost of some complex includes in projects.
one include directory with everything inside
split includes by domain, so there may be about 6 directories for someone using the lot but a pure database user would probably have a single directory.
From a Unix build aspect, the recommended structure has been 2. My situation is complicated by needing to keep Visual Studio and XCode users happy (sniff, CodeWarrior, how I doth miss thee!).
Edit - the chosen solution:
I went with four subdirectories in include. I started trying to divide them up further by platform but it just got very noisy very quickly.

Personally I would go with 2, or 3 if really pushed.
But whichever you choose, please make it crystal clear in the build instructions how to set up the include paths. Nothing dooms an open source project more than it being really difficult to build - developers want a quick out-of-the-box experience and if it involves faffing around with many undocumented environment variables (or whatever) most will simply go away.

C++ Directory Restructuring

I have a source code of about 500 files in about 10 directories. I need to refactor the directory structure - this includes changing the directory hierarchy or renaming some directories.
I am using svn version control. There are two ways to refactor: one preserving svn history (using svn move command) and the other without preserving. I think refactoring preserving svn history is a lot easier using eclipse CDT and SVN plugin (visual studio does not fit at all for directory restructuring).
But right now since the code is not released, we have the option to not preserve history.
Still there remains the task of changing the include directives of header files wherever they are included. I am thinking of writing a small script using python - receives a map from current filename to new filename, and makes the rename wherever needed (using something like sed). Has anyone done this kind of directory refactoring? Do you know of good related tools?

If you're having to rewrite the #includes to do this, you did it wrong. Change all your #includes to use a very simple directory structure, at mot two levels deep and only using a second level to organize around architecture or OS dependencies (like sys/types.h).
Then change your make files to use -I include paths.
Voila. You'll never have to hack the code again for this, and compiles will blow up instantly if something goes wrong.
As far as the history part, I personally find it easier to make a clean start when doing this sort of thing; archive the old one, make a new repository v2, go from there. The counterargument is when there is a whole lot of history of changes, or lots of open issues against the existing code.
Oh, and you do have good tests, and you're not doing this with a release coming right up, right?

I would preserve the history, even if it takes a small amount of extra time. There's a lot of value in being able to read through commit logs and understand why function X is written in a weird way, or that this really is an off-by-one error because it was written by Oliver, who always gets that wrong.
The argument against preserving the history can be made for the following users:
your code might have embarrassing things, like profanity and fighting among developers
you don't care about the commit history of your code, because it's not going to change or be maintained in the future
I did some directory refactoring like this last year on our code base. If your code is reasonable structured at the beginning, you can do about 75-90% of the work using scripts written in your language of choice (I used Perl). In my case, we were moving from set of files all in one big directory, to a series of nested directories depending on namespaces. So, a file that declared the class protocols::serialization::SerializerBase was located in src/protocols/serialization/SerializerBase. The mapping from the old name to the new name was trivial, so that doing a find and replace on #includes in every source file in the tree was trivial, although it was a big change. There were a couple of weird edge cases that we had to fix by hand, but that seemed a lot better than either having to do everything by hand or having to write our own C++ parser.

Hacking up a shell script to do the svn moves is trivial. In tcsh it's foreach F ( $FILES ) ... end to adjust a set of files. Perl & Python offer better utility.
It really is worth saving the history. Especially when trying to track down some exotic bug. Those who do not learn from history are doomed to repeat it, or some such junk...
As for altering all the files... There was a similar question just the other day over at:
https://stackoverflow.com/questions/573430/
c-include-header-path-change-windows-to-linux/573531#573531

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js