Visualizing a huge C++ project using Doxygen + Graphviz

Visualizing a huge C++ project using Doxygen + Graphviz - c++

I've inherited a large C++ project which I need to port to Linux. There are over 200,000 lines of source in this project spread across more than 300 files. It would be tremendously helpful to have a visual dependency/include tree to refer to for this project so that I can get a general feel for the application's internal structure. This would also help me to locate the "fault lines" between the core modules and Windows header files so that I can stub them out later.
The class viewer in Visual Studio simply isn't cutting it. I was reading around, and learned that Doxygen is a commonly used tool for listing dependencies. I'm much more of a visual person, and found that this wasn't so helpful. Fortunately, I learned about the Graphviz plugin, using something called "Dot" that has enabled me to generate dependency trees for parts. Unfortunately, hundreds of smaller dependency trees are generated for specific files, rather than having one large one as I'd hoped for. Here are a couple of examples:
As you can see (I hope), Doxygen/GraphViz seem to give up when the graph gets too large and gray out the child nodes. I then have to go to the graph for that specific node if I want to see what's further down the tree. Not only does this limit the visual helpfulness of the graph, but if the child node depends on any of the nodes from the original graph, these nodes will be shown again. This is leading to lots of duplicate connections that make it very hard to conceptually isolate the graph from any given file. As a result, I feel like I'm "zoomed in" and still can't see the whole picture.
I've tried playing around with the DOT_GRAPH_MAX_NODES setting in the Expert view in Doxygen, but this doesn't seem to affect the scope of the graphs that are being generated. From the output generated from any given run, it seems like Doxygen itself is generating hundreds of graph files, and Graphviz is just faithfully generating graphs for each one. Is there any known way to make Doxygen generate one large graph file instead of hundreds of smaller ones?
Alternatively, are there any free visual graphing solutions out there which know how to handle complicated C++ project files with nested pre-processor directives, MIDL interfaces, and manually defined include paths the way Doxygen does?
My searches are finding general graphing utilities (or questions about them), but nothing specific to large C++ projects. Surely with all the coding that's been done over the years somebody must have such a tool!
Thanks,
-Alex

You can use the XML files generated by doxygen, and merge them into a single giant dot-format graph file (using xml stylesheet or similar), then run graphviz on it.
Doxygen automatically invoking graphviz is most useful when the number of graphs is high. For a single graph, automatically creating the content is important, but automatically calling dot, not so much.

Related

Handling really large multi language projects

I am working on an really large multi language project (1000+ Classes + Configs + Scripts), with files distributed over network drives. I am having trouble fighting through the code, since the available Tools are not helping. The main problem is finding things. For the C++ Part: VS with VAX can only find files and symbols which are in the solution. A lot of them are not. Same problem with Reshaper. Right now i am stuck with doing unindexed string and file searches, which is highly inefficient on a network drive. I heared that SourceInsight would be an option since it allows you to just specify the folders that are part of the project and than indexes them, but my company wont spent money on it.
So my question ist: what Tools are there available to fight through an incredible large amount of code? And if possible they should be low cost or even free/open source.

Check out -
ctags
cscope
idutils
snavigator
In every one of these tools, you would have to invest(*) some time in reading the documentation, and then building your index. Consider switching to an editor that will work with these tools.
(*): I do mean invest, because it will reap dividends once you do.
hope this helps,

If you need to maintain a large amount of code, you really should have a source code managment system, a lot of them will help you find text by indexing all the files
And Most of them will work with various language.
Otherwise you can install some indexer like Apache Lucene and index all your files...

You should take a look at LXR. This is used by many Linux kernel source listings.

Try ndexer http://code.google.com/p/ndexer/
promises to Handle extremely large codebases!

The Perl program ack is also worth a look -- think of it as multi-file grep on steroids. The new version (in what I would call late beta) even lets you specify regexes for the files to process as well as regexes to search for -- a feature I've used extensively since it came out (I've got a subproject with 30k lines in 300+ classes, where this feature has been very helpful). You can even chain the new ack with itself so you can subselect the files to process.

VS with VAX can only find files and symbols which are in the solution. A lot of them are not.
You can add all the files that are not in your solution and set them to not build in the settings. Your VS build will not be affected by this, but now VS knows about those files and you can search them along with your VS native files.

How to organize sources of complex program?

We're creating very complex embedded system and «sources» contains few projects of Visual C++, IAR, Code Composer Studio and Altium Designer schemes and pcbs. All of that possibly could be in few versions.
So, what practice could you advice me to arrange all that stuff?
Thank you

I have the same setup as you.
I use Altium Designer for the hardware schematics and PCB design. But I also have Firmware source files and related utilities. And I have mechanical design files.
Here's how I do it:
Project Name
Firmware
MainCpu
trunk
tags
branches
IoCpu
trunk
tags
branches
Hardware
MainPcb
trunk
tags
branches
IoPcb
trunk
tags
branches
PowerPcb
trunk
tags
branches
Mechanical
Chassis
trunk
tags
branches
Other
trunk
tags
branches
This way all the project files are stored together in the SVN repository. The only down side I've found is that you can't just check out the Project and get the latest FW/HW/MEK files. You have to check out each Head of FW/HW/MEK.
The reason for the separate sub-modules for FW/HW/MEK is that they will get separate version tags.

Everything that you consider as sources should be under a Source Control System, like SVN. This is the best way to handle versions, revisions, branches and tags. SVN can handle binary files, so you won't have problems with non-text files.

If your C++ source files are numerous and span multiple directories then the effort put into grokking Large Scale C++ Software Design by John Lakos may be very worth it. The main theme of the book is how your physical layout of the software, that is, the arrangement of source code files in directories, limit or extend your ability to modify the software.

I like to have a directory structure that at the top level reflects each of the programmable parts.(i.e. microcontroller, DSP1, FPGA1, FPGA2,...)
I also like to have a subdirectory(ies) that has all the generated files, so it is easy to make a clean source tree. Also make it easy to do a clean build straight from the source code configuration tool. (i.e. get and build from source to binary image(s) in as few steps as possible)
Also have each programmable part have it's own version number, and one version number that reflects each of the combination of the sub component version numbers.

Definitely use source control, if the program itself doesn't support it, just keep the parent folder you use under source control. SVN is my current fav.
As far as how to arrange your files, I noticed you had Altium Designer on your list, that program will a) play nice with source control, and b) arrange your files in an orderly manner, assuming you use their whole 'project' file structure. Look into using their 'PCB' (if that's what your doing) or 'embedded' projects, when you create one, it creates buckets for you to store all your different types of files into.
Even if you don't want to actually use Altium for your files, create a project and look at their directory structure to get an idea about all the files you'll need to keep track of.

(Aside from trivial helper classes) put one class in each cpp/h file, and name the cpp/h files the same as the class.
Group related classes files into folders (you can optionally use a hierarchy of namespaces that match the folder structure. The .net approach here is to use a CompanyName.ProductName namespace, with your files stored in a ProductName project/subfolder of your solution). So for example, you might group your Math, I/O, and Drawing classes into separate "subsystem" folders.
Ideally, make these separate sections into re-usable libraries (MyCompany.Math). You'll be glad of this later when you want to develop a new product that will share some of the code. In that case, the top level "folders" become separate projects in their own right, and you can start to work on minimising dependences between them to realise and then enforce a much better overall framework design in your code base.
The ideal within folders is to find a good balance between clutter and sparseness - try to balance the folders so that they have between 5-15 files in each. If fewer, consider merging the folders; if greater, consider adding sub-category folders to break down the complexity.
As long as your classes/files and namespaces/folders have good descriptive names, and your folders are logically structured, you can make an extremely large project very easy to navigate.
At the risk of starting a religious war, I prefer to put the headers and their source files in the same folder so that when you are editing a .cpp the .h is easily accessible rather than having to move up and fown by a folder all the time.

Reduce the complexity!
My first engineering professor had a famous first lecture. It consisted of a single equation written on the blackboard:
Perfection = Simplicity
The problem with Source Control Systems is that they manage complexity but also promote it.

xsd-based code generator to build xml?

I have a schema (xsd), and I want to create xml files that conform to it.
I've found code generators that generate classes which can be loaded from an xml file (CodeSynthesis). But I'm looking to go the other direction.
I want to generate code that will let me build an object which can easily be written out as an xml file. In C++. I might be able to use Java for this, but C++ would be preferable. I'm on solaris, so a VisualStudio plugin won't help me (such as xsd2code).
Is there a code generator that lets me do this?

To close this out: I did wind up using CodeSynthesis. It worked very well, as long as I used a single xsd as its source. Since I actually had two xsds (one imported the other), I had to manually merge them (they did some weird inheritance that needed manual massaging).
But yes, Code Synthesis was the way to go.

library for doing diffs

I've been tasked with creating a tool that can diff and merge the configuration files for my company's product. The configurations are stored as either XML or URL-encoded strings. I'm looking for a library, preferably open source with a license compatible with commercial software, that can do these diffs. Our app is written in C++, so C++ libraries would be best, but I'm willing to look at libraries that are C#-specific since I can write a wrapper that exposes it to C++ via COM. Three-way diffs would be ideal, but two-way is acceptable. If it has an understanding of XML, that would also be a plus (since XML nodes can be reordered without changing the document, etc). Any library suggestions? Should I even consider writing my own diff tools in the hopes of giving it semantic knowledge of our formats?
Thanks to this similar question, I've already discovered this google library, which seems really great, but I'm still looking for other options. It also seems to be able to output the diffs in HTML format (using the <ins> and <del> tags that I didn't know existed before I discovered it), which could be really handy, but it seems to be a unified diff only. I'm going to need to display the results in a web browser, and probably have to build an interface for doing the merges in the browser as well. I don't expect a library to be able to help with these tasks, but it must produce output in a format that is amenable to me building this on top of it. I'm currently envisioning something along the lines of TortoiseMerge (side-by-side diffs, not unified), except browser-based. Any tips/tricks/design ideas on how to present this would be appreciated too.

Subversion comes with libsvn_diff and libsvn_delta licensed under Apache Software License.

Here is a C++ library that can diff what the author calls semistructured data. It deals nicely with HTML and XML. Since your data is XML it would make a lot of sense to use this instead of plain text diff. This is especially the case when the files are machine generated.
I am currently trying to use this library to build a tool that diffs Visual Studio project files. These are basically XML files and using a plain diff tool like Winmerge is too painful because Visual Studio pretty much mucks up the whole file by crazy reordering. The idea is to do some kind of a structured diff to address the problem.

For diffing the XML I would propose that you normalize it first: sort all the elements in alphabetic order, then generate a stream of tokens/xml that represents the original document but is independent of the original formatting. After running the diff, parse the result to get a tree containing what was added / removed.

C++ Header files - put them in one directory or merged in a tree structure?

I have a substantial body of source code (OOFILE) which I'm finally putting up on Sourceforge. I need to decide if I should go with a monolithic include directory or keep the header files with the source tree.
I want to make this decision before pushing to the svn repo on SourceForge. I expect a lot of people who use it after that move will keep a working copy checked out directly from SF so won't want to change their structure.
The full source tree has about 262 files in 25 folders. There are a lot more classes than that suggests as due to conforming to 8.3 character names (yes it dates back to Win3.1) many classes are in one file. As I used to develop with ObjectMaster, that never bothered me but I will be splitting it up to conform to more recent trends to minimise the number of classes per file. From a quick skim of the class list, there are about 600 classes.
OOFILE is a cross-platform product expected to be built on Mac, Windows and assorted Unix platforms. As it started life on Mac, with compilers that point to include trees rather than flat include dirs, headers were kept with the source.
Later, mainly to keep some Visual Studio users happy, a build was reorganised with a single include directory. I'm trying to choose between those models.
The entire OOFILE product covers quite a few domains:
database front-end
range of database backends
simple 2D graphing engine for Mac and Windows
simple character-mode report-writer for trivial html and text listing
very rich banding report-writer with Mac and Windows Preview and Printing and cross-platform generation of text, RTF, HTML and XML reports
forms integration engine for easy CRUD forms binding to the database, with implementations on PowerPlant and MFC
cross-platform utility classes
file and directory manipulation
strings
arrays
XML and tag generation
Many people only want to use it on a single platform and some of those code areas are pure legacy (eg: PowerPlant UI framework on classic Mac). It therefore seems people would appreciate not having headers from those unwanted areas dumped in their monolithic include directory.
I started thinking about having an include directory split up into a few of the domains above and then realised that was sounding more like the original structure.
In summary, the choices seem to be:
Keep original model, all headers adjacent to source - max flexibility at cost of some complex includes in projects.
one include directory with everything inside
split includes by domain, so there may be about 6 directories for someone using the lot but a pure database user would probably have a single directory.
From a Unix build aspect, the recommended structure has been 2. My situation is complicated by needing to keep Visual Studio and XCode users happy (sniff, CodeWarrior, how I doth miss thee!).
Edit - the chosen solution:
I went with four subdirectories in include. I started trying to divide them up further by platform but it just got very noisy very quickly.

Personally I would go with 2, or 3 if really pushed.
But whichever you choose, please make it crystal clear in the build instructions how to set up the include paths. Nothing dooms an open source project more than it being really difficult to build - developers want a quick out-of-the-box experience and if it involves faffing around with many undocumented environment variables (or whatever) most will simply go away.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js