Discovering Symbol Usage

Discovering Symbol Usage - c++

Issue
I have recently found myself working with a large, unfamiliar, multi-department, C++ codebase in need of better organization. I would like to discover a way to map which symbols are used by which source files for any given header. This is in the hope that if only one department uses a given function, then it can be moved out of the shared area and into that department's area.
Attempts
My first thoughts were to use the symbol table: ie. compile the project and dump the symbols for each object file. From there I figured I could simply write a script to check if the symbols from my header file were used. While this approach seems viable, it would require me to create a list of symbols I am looking for from the headers. With my limited knowledge, I am unsure of how to automate such a process, and with hundreds of headers files to test, doing it manually is out of the question.
Questions
Is my approach valid? If so..
What can I use to generate the symbol names from my header file?
If not..
What else can I do?
Additionally, while I am using Linux, most of the development teams work in Windows only environments. What utilities could I use on both platforms?
Any and all help is greatly appreciated.

When I need to clean up APIs I sometimes use information from callcatcher. It basically builds a database of all symbols while compiling and allows you to determine what symbols are used in some build product.
I sometimes also use DXR (code on github, an example installation) to browse what code defined where is used how. In contrast to callcatcher with DXR you can drill down to much finer detail. Setting up DXR is pretty heavy duty, but might be worth it if you have enough code to work with.
On the other side of the spectrum there are tools like cscope. Even though it doesn't work super nicely with C++ code it is still very useful. If you deal with more than a couple 100kloc you will quickly feel limited though.
If I had to pick only one of these tools and would be working on a large code base (>1Mloc) I would definitely pick DXR.

You can get a reasonable start on the information that you've described by using doxygen.
Even for source that doesn't contain the doxygen formatted comments the documentation created can contain a list of places (ie. source files) where a particular symbol is used.
And, as doxygen can be used to generate html documentation, navigating through your source tree becomes trivial. It's can be even better if you enable the dot functionality to generate relationship diagrams for the classes in your source tree.

very old-school, simple, and possibly unix only, but are you aware of etags? there's also gnu global which i think is similar.
the gnu global link refers to the "comparison with similar tools" discussion here which might also be useful.

Related

Can I have configurable static linkage in C++?

I know this sounds like "can I have dynamic static linkage" which makes little sense, but let me explain.
I am looking for options to explore and I am aware that there might be something out there that I'm not aware of.
My goal is to have a modular code base where plugins would be provided as static libs, to avoid having exposed dll and result in a single library which clients could swallow in their code.
I imagine having a config file listing all the desired plugins, feed that to a script and boom : a magic sln file with everything ready to be built.
My initial idea is to have a 'main deck' that would know every possible plugin interface and link with them all. Plugins not yet implemented or required by the client would be dead-end/no-op implementations, while required plugins would be implementations realizing the interfaces called by the 'main deck'.
I think that would work, but I find conceptually horrible the idea of linking dead implementations for the sake of modularity.
The main issue I see is at that 'main deck' level : how could I remove useless headers to prevent useless linking or add newly developed ones without editing the code each time? I cannot figure this out without a ton of macros or generating some source files.
Could other patterns solve that issue?

I think there is no possible way that doesn't involve macro magic and a complex build system.
If I understand correctly, what you want to do is similar to a library I have used, rocksdb. At build time you can specify what modules/packages you want and it will build them into the static library for you. Check out what they do and see if it is along the lines of what you want.

What generic template processor should I use?

This is a potentially dangerous question because interdisciplinary questions and answers will be biased, but I'll have a stab at it anyway. All in good spirit!
So, here we go. I'm writing a major editing mode for Emacs for the language that it has almost no support for yet. And I'm at the point, where I have to decide on a way to generate project files. Below is the syllabus of the task ahead:
The templates have to represent project directory tree, not only single files.
The resulting files are of various formats, potentially including SGML-like languages, but not limited to this variety. They also have to generate C-like source code and, eLisp source code and plain text files, like README, for example.
The templates must be processed in a batch upon user-initiated action (as in user wants to create a project - several files must be created in the user-appointed directory). It may be beneficial to have an ability to supervise the creation, but this is less important then the ability to run the process entirely automatically.
Bonus features:
The template language has already a user base (with a potential of reuse of existing templates).
The templates can be used for code snippets (contain blanks which are filled interactively once the user invokes code-generating routine while editing the file).
Obvious things like cross-platform-ness, ease of use both through graphical interface and command line.
I made a research, but I won't share my results (yet) so I won't bias the answers. The problem with answering this question is not that the answer is hard to find, but that it is hard to chose one from many.

I'm developing a system based on Mustache for exactly the use case that you've described. The template language itself is a very simple extension of Mustache called Groome.
I also released a command-line tool called Molt that renders Groome templates. I'd be curious to know if it does everything that you need. I'm still adding features to the tool and haven't yet announced it. Thanks.

I went to solve a similar problem several years aback, where I wanted to use Emacs to generate code out of a UML diagram (cogre), and also generate Makefiles from project specifications. I first tried to use Tempo, but when I tried to get the templates to nest, I ran into problems. I also looked into skeleton, but that didn't quite fit the plan either.
I ended up using Google Templates for a little bit, and liked the syntax, and developed SRecode instead, and just borrowed the good bits from Google templates. SRecode was written specifically for machine-generated code. The interaction for template insertion (aka - what tempo was written for) isn't first class in SRecode. For generating code from a data structure, however, it is very robust, and has a lot of features, and automatically filled variables. It works closely with your major mode, and allows many nested templates, with control over the nested dictionary values. There is a subsystem that will use Semantic tags and generate code from them for a couple languages. That means you can parse code in one language with Semantic, and generate code in another language with SReocde using those tags. Nifty! Many parts of CEDET Reference manuals were built that way.
The templates themselves allow looping, if statements, and include statements. There are a couple examples in SRecode for making an 'application', such as the comment writer, and EDE uses it to create Makefiles, which is almost exactly what you are trying to do.

Another option is Generator, which offers “language-agnostic project bootstrapping with an emphasis on simplicity”. Installation requires Node.js and npm.
Generator’s emphasis on simplicity means it is very easy to learn how to make a template. Generator also saves you from having to reference templates by file paths – it looks for templates in ~/.generator.
However, there is no way to write README or LICENSE files for the template itself without those files being copied to the generated project. Also, post-generation commands written in the Makefile will be copied to the generated Makefile, even after they are no longer of use. Finally, the ad-hoc templating language doesn’t provide a way to escape its __lowercasevariables__ – though I can’t think of a language where that limitation would be a problem.

Is there an efficient way to 'enumerate' a namespace in C++?

Is there a way to programmatically enumerate a namespace and its members in C++?
I have a large C++ program which utilizes several namespaces. I am unfamiliar with the codebase, and would like to determine which functions/classes/variables are associated with which namespaces.
My current approach involves simply removing the 'using namespace' directives one by one and checking what breaks during compilation, but I assume there is a much better way to achieve the same goal.

This is not possible in C++.
However, you can use external tools, such as Doxygen, that will create documentation (HTML, and other formats) that will list all the members of your namespaces.

Unfortunately, introspection is NOT one of C++'s big features. There's no way (within the language) to do what you want. You'll need an external code analysis tool (something that can parse the code and build a reference) to do the job. I use cscope for a lot of analysis, but to my knowledge it doesn't really know about namespaces, so probably not the right tool for you.

You can use a C++ front-end (e.g. Elsa) to do the job for you.
Also consider using a good IDE that has a 'Go To Defiinition' functionality (e.g. Microsoft Visual Studio).

You can start by running Doxygen to generate an index of all the functions/classes/namespaces defined in your project. Make sure to edit the settings to generate the index for undocumented symbols.

If you know which namespaces you're looking for, you can just generate a map file (g++ -Wl,-Map,MyMapFile.map). Then search for e.g. MyNamespace:: in the map file.

Reading/Understanding third-party code

When you get a third-party library (c, c++), open-source (LGPL say), that does not have good documentation, what is the best way to go about understanding it to be able to integrate into your application?
The library usually has some example programs and I end up walking through the code using gdb. Any other suggestions/best-practicies?
For an example, I just picked one from sourceforge.net, but it's just a broad engineering/programming question:
http://sourceforge.net/projects/aftp/

I frequently use a couple of tools to help me with this:
GNU Global. It generates cross-referencing databases and can produce hyperlinked HTML from source code. Clicking function calls will take you to their definitions, and you can see lists of all references to a function. Only works for C and perhaps C++.
Doxygen. It generates documentation from Javadoc-style comments. If you tell it to generate documentation for undocumented methods, it will give you nice summaries. It can also produce hyperlinked source code listings (and can link into the listings provided by htags).
These two tools, along with just reading code in Emacs and doing some searches with recursive grep, are how I do most of my source reverse-engineering.

One of the better ways to understand it is to attempt to document it yourself. By going and trying to document it yourself, it forces you to really dive in and test and test and test and make sure you know what each statement is doing at what times. Then you can really start to understand what the previous developer may have been thinking (or not thinking for that matter).

Great question. I think that this should be addressed thoroughly, so I'm going to try to make my answer as thorough as possible.
One thing that I do when approaching large projects that I've either inherited or contributing to is automatically generate their sources, UML diagrams, and anything that can ease the various amounts of A.D.D. encountered when learning a new project:)
I believe someone here already mentioned Doxygen, that's a great tool! You should look into it and write a small bash script that will automatically generate sources for the application you're developing in some tree structure you've setup.
One thing that I've haven't seen people mention is BOUML! It's fantastic and free! It automatically generates reverse UML diagrams from existing sources and it supports a variety of languages. I use this as a way to really capture the big picture of what's going on in terms of architecture and design before I start reading code.
If you've got the money to spare, look into Understand for %language-here%. It's absolutely great and has helped me in many ways when inheriting legacy code.
EDIT:
Try out ack (betterthangrep.com), it is a pretty convenient script for searching source trees:)

Familiarize yourself with the information available in the headers. The functions you call will be declared there. Then try to identify the valid arguments and pre-/post-conditions of the functions, as those are your primary guidance (even if they are not documented!). The example programs are your next bet.

If you have code completion/intellisense I like opening up the library and going '.' or 'namespace::' and seeing what comes up. I always find it helpful, you can navigate through the objects/namespaces and see what functionality they have. This is of course assuming its an OOP library with relatively good naming of functions/objects.

There really isn't a silver bullet other than just rolling up your sleeves and digging into the code.
This is where we earn our money.

Three things;
(1) try to run the test or example apps available, set low debug levels, and walk through logs.
(2) use source navigator tool / cscope ( available both on windows and linux) and browse the code to understand the flow.
(3) also in parallel use gdb to walk into code while running test/example apps.

How to generate empty definitions given a header file

I have a 3rd-party library which for various reasons I don't wish to link against yet. I don't want to butcher my code though to remove all reference to its API, so I'd like to generate a dummy implementation of it.
Is there any tool I can use which spits out empty definitions of classes given their header files? It's fine to return nulls, false and 0 by default. I don't want to do anything on-the-fly or anything clever - the mock object libraries I've looked at appear quite heavy-weight? Ideally I want something to use like
$ generate-definition my_header.h > dummy_implemtation.cpp
I'm using Linux, GCC4.1

This is a harder problem than you might like, as parsing C++ can quickly become a difficult task. Your best bet would be to pick an existing parser with a nice interface.
A quick search found this thread which has many recommendations for parsers to do something similar.
At the very worst you might be able to use SWIG --> Python, and then use reflection on that to print a dummy implementation.
Sorry this is only a half-answer, but I don't think there is an existing tool to do this (other than a mocking framework, which is probably the same amount of work as using a parser).

Create one test application which reads the header file and creates the source file. Test application should parse the header file to know the function names.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js