Using clang to analyze C++ code - c++

We want to do some fairly simple analysis of user's C++ code and then use that information to instrument their code (basically regen their code with a bit of instrumentation code) so that the user can run a dynamic analysis of their code and get stats on things like ranges of values of certain numeric types.
clang should be able to handle enough C++ now to handle the kind of code our users would be throwing at it - and since clang's C++ coverage is continuously improving by the time we're done it'll be even better.
So how does one go about using clang like this as a standalone parser? We're thinking we could just generate an AST and then walk it looking for objects of the classes we're interested in tracking. Would be interested in hearing from others who are using clang without LLVM.

clang is designed to be modular. Quoting from its page:
A major design concept for clang is
its use of a library-based
architecture. In this design, various
parts of the front-end can be cleanly
divided into separate libraries which
can then be mixed up for different
needs and uses.
Look at clang libraries like libast for your needs. Read more here.

What you didn't indicate is what kind of "analyses" you wanted to do. Most C++ analyses require that you have accurate symbol table data so that when you encounter a symbol foo you have some idea what it is. (You technically don't even know what + is without such a symbol table!) You also need generic type information; if you have an expression "a*b", what is the type of the result? Having "name and type" information is key to almost anything you want to do for analysis.
If you insist on clang, then there are other answers here. I don't know it it provides for name and type resolution.
If you need name and type resolution, then another solution would the DMS Software Reengineering Toolkit. DMS provides generic compiler like infrastructure for parsing, analyzing, transforming, and un-parsing (regenerating source code from the compiler data structures). DMS's industrial-strength C++ front end (it has many other language front ends, too) provides full name and type resolution according to the ANSI standard as well a GCC and MS VC++ dialects.
Code transformations can be implemented via an abstract-syntax tree interface provided by DMS, or by pattern-directed program transformation rules written in the surface syntax of your target language (in this case, C++). Here's a simple transformation using the rule language:
domain Cpp~GCC3; -- says we want patterns for C++ in the GCC3 dialect
rule optimize_to_increment(lhs:left_hand_side):expression -> expression
" \lhs = \lhs + 1 " -> " \lhs++" if no_side_effects(lhs).
This implicitly operates on the ASTs built by DMS, to modify them. The conditional
allows you to inquire about arbitrary properties of pattern variables (in this case, lhs), including name and type constraints if you wish.
DMS has been used many times for very sophisticated program analysis and transformation of C++ code. We build C++ test coverage tools by instrumenting C++ code in a rather obvious way using DMS. At the website, there's a bibligraphy with papers describing how DMS was used to restructure the architecture of a large product line of military aircraft mission software. This kind of activity literally pours C++ in one architectural shape into another by applying large numbers of pattern directed transforms such as the above.
It is likely to be very easy to implement your instrumentation. And you don't have to wait for it to mature.

Related

Are there convenient tools to automatically check C++ coding conventions beyond style checks?

Are there good tools to automatically check C++ projects for coding conventions like e.g.:
all thrown objects have to be classes derived from std::exception (i.e. throw 42; or throw "runtime error"; would be flagged as errors, just like throw std::string("another runtime error"); or throwing any other type not derived from std::exception)
In the end I'm looking for something like Cppcheck but with a simpler way to add new checks than hacking the source code of the check tool... May be even something with a nice little GUI which allows you to set up the rules, write them to disk and use the rule set in an IDE like Eclipse or an continuous integration server like Jenkins.
I ran a number of static analysis tools on my current project and here are some of the key takeaways:
I used Visual Lint as a single entry point for running all these tools. VL is a plug-in for VS to run third-party static analysis tools and allows a single click route from the report to the source code. Apart from supporting a GUI for selecting between the different levels of errors reported it also provides automated background analysis (that tells you how many errors have been fixed as you go), manual analysis for a single file, color coded error displays and charting facility. The VL installer is pretty spiffy and extremely helpful when you're trying to add new static analysis tools (it even helps you download Python from ActiveState should you want to use Google cpplint and don't have Python pre-installed!). You can learn more about VL here: http://www.riverblade.co.uk/products/visual_lint/features.html
Of the numerous tools that can be run with VL, I chose three that work with native C++ code: cppcheck, Google cpplint and Inspirel Vera++. These tools have different capabilities.
Cppcheck: This is probably the most common one and we have all used it. So, I'll gloss over the details. Suffice to say that it catches errors such as using postfix increment for non-primitive types, warns about using size() when empty() should be used, scope reduction of variables, incorrect name qualification of members in class definition, incorrect initialization order of class members, missing initializations, unused variables, etc. For our codebase cppcheck reported about 6K errors. There were a few false positives (such as unused function) but these were suppresed. You can learn more about cppcheck here: http://cppcheck.sourceforge.net/manual.pdf
Google cpplint: This is a python based tool that checks your source for style violations. The style guide against which this validation is done can be found here: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml (which is basically Google's C++ style guide). Cpplint produced ~ 104K errors with our codebase of which most errors are related to whitespaces (missing or extra), tabs, brace position etc. A few that are probably worth fixing are: C-style casts, missing headers.
Inspirel Vera++: This is a programmable tool for verification, analysis and transformation of C++ source code. This is similar to cpplint in functionality. A list of the available rules can be found here: http://www.inspirel.com/vera/ce/doc/rules/index.html and a similar list of available transformations can be found here: http://www.inspirel.com/vera/ce/doc/transformations/index.html. Details on how to add your own rule can be found here: http://www.inspirel.com/vera/ce/doc/tclapi.html. For our project, Vera++ found about 90K issues (for the 20 odd rules).
In the upcoming state: Manuel Klimek, from Google, is integrating in the Clang mainline a tool that has been developed at Google for querying and transforming C++ code.
The tooling infrastructure has been layed out, it may fill up but it is already functional. The main idea is that it allows you to define actions and will run those actions on the selected files.
Google has created a simple set of C++ classes and methods to allow querying the AST in a friendly way: the AST Matcher framework, it is being developped and will allow very precise matching in the end.
It requires creating an executable at the moment, but the code is provided as libraries so it's not necessary to edit it, and one-off transformation tools can be dealt with in a single source file.
Example of the Matcher (found in this thread): the goal is to find calls to the constructor overload of std::string formed from the result of std::string::c_str() (with the default allocator), because it can be replaced by a simple copy instead.
ConstructorCall(
HasDeclaration(Method(HasName(StringConstructor))),
ArgumentCountIs(2),
// The first argument must have the form x.c_str() or p->c_str()
// where the method is string::c_str(). We can use the copy
// constructor of string instead (or the compiler might share
// the string object).
HasArgument(
0,
Id("call", Call(
Callee(Id("member", MemberExpression())),
Callee(Method(HasName(StringCStrMethod))),
On(Id("arg", Expression()))
))
),
// The second argument is the alloc object which must not be
// present explicitly.
HasArgument(1, DefaultArgument())
)
It is very promising compared to ad-hoc tool because it uses the Clang compiler AST library, so not only it is guaranteed that no matter how complicated the macros and template stuff that are used, as long as your code compiles it can be analyzed; but it also means that intricates queries that depend on the result of overload resolution can be expressed.
This code returns actual AST nodes from within the Clang library, so the programmer can locate the bits and nits precisely in the source file and edit to tweak it according to her needs.
There has been talk about using a textual matching specification, however it was deemed better to start with the C++ API as it would have added much complexity (and bike-shedding). I hope a Python API will emerge.
The key problem with "style checkers" is that style is like art: everybody has a different opinion about what is good style and what is not. The implication is that style checkers will always need to be customized to the local art tastes.
To do this right, one needs a full C++ parser with access to symbol definitions, scoping rules and ideally various kinds of flow analyses. AFAIK, CppCheck does not provide accurate parsing or symbol table definitions, so its error checking can't be both deep and correct. I think Coverity and Fortify offer something along these lines using the EDG front end; I don't know if their tools offer access to symbol tables or data flow analyses. Clang is coming along.
You also need a way to write the style checks. I think all the tools offer access to an AST and perhaps symbol tables, and you can hand code your own checks, at the cost of knowing the AST intimately, which is hard for a big language like C++. I think Coverity and Fortify have some DSL-like scheme for specifying some of the checks.
If you want to fix code that is style incorrect, you need something that can modify the code representation. Coverity and Fortify do not offer this AFAIK. I believe Clang does offer the ability to modify the AST and regenerate code; you still have to have pretty intimate knowledge of the AST structure to code the tree hacking logic and get it right.
Our DMS Software Reengineering Toolkit and its C++ front end provide most of these capabilities. Using its C++ front end, DMS can parse ANSI C++11, GCC4 (with C++11 extensions) and MSVS 2010 (with its C++11 extensions) [update May 2021: now full C++17 and most of C++20] build ASTs and symbol tables with full type information. One can also ask for the type of an arbitrary expression AST node. At present, DMS computes control flow but not data flow for C++.
An AST API lets you procedurally code arbitrary checks; or make changes to the AST to fix problems, and then DMS's prettyprinter can regenerate complete, compilable source text with comments and preserved literal format information (eg., radix of numbers, etc.). You have to know the AST structure to do this, just like other tools, but it is a lot easier to know, because it is isomorphic to the DMS C++ grammar rules. The C++ front end comes with the our C++ grammar. [DMS uses GLR parsers to make this possible].
In addition, one can write patterns and transformations using DMS's Rule Specification Language, using the surface syntax of C++ itself. One might code OPs "dont throw nonSTL exceptions" as
pattern nonSTLexception(i: IDENTIFIER):statement
= " throw \i; " if ~derived_from_STD_exception(i);
The stuff inside the (meta)quotes is C++ source code with some pattern-matching escapes, e.g, "\i" refers to the placeholder varible "i" which must be a C++ IDENTIFIER according the rule; the entire "throw \i;" clause must be a C++ "statement" (a nonterminal in the C++ grammar). The rule itself mainly expresses syntax to be matched, but can invoke semantic checks (such as "~is_derived_from_STD_exception") applied to matched subtrees (in this case, whatever "\i" matched).
In writing such patterns, you don't have to know the shape of the AST; the pattern knows it, and it is automatically matched. If you've ever coded AST walkers, you will appreciate how convenient this is.
A match knows the AST node and therefore the precision position (file/line/column) which makes it easy to generate reports with precise location information.
You need to add a custom routine to DMS, "inherits_from_STD_exception", to verify the identifier tree node passed to that routine is (as OP desired) a class derived from
std::exception. This requires finding "std::exception" in the symbol table,
and verifying that the symbol table entry for the identifier tree node is a class
declaration and transitively inherits from other class declarations (by following symbol table links) until the std::exception symbol table entry is found.
A DMS transformation rule is a pair of patterns stating in essence, "if you see this, then replace it by that".
We've built several custom style checkers with DMS for both COBOL and C++. Its still a fair amount of work, mostly because C++ is a pretty complex language and you have to think carefully about the precise meaning of your check.
Trickier checks and those tests that start to fall into deep static analysis require access to control and data flow information. DMS computes control flow for C++ now, and we're working on data flow analysis (we've already done this for Java, IBM Enterprise COBOL and a variety of C dialects). Analysis results are tied back to the AST nodes so that one can use patterns to look for elements of the style check, and then follow the data flows to tie the elements together if needed.
When all is said and done with DMS, (or indeed with any of the other tools that deal with C++ in any halfway accurate way), is that coding additional or complex style checks is unlikely to be "convenient". You should hope for "possible with good technical background."

How to write a C++ code generator that takes C++ code as input?

We have a CORBA implementation that autogenerates Java and C++ stubs for us. Because the CORBA-generated code is difficult to work with, we need to write wrappers/helpers around the CORBA code. So we have a 2-step code generation process (yes, I know this is bad):
CORBA IDL -> annoying CORBA-generated code -> useful wrappers/helper functions
Using Java's reflection, I can inspect the CORBA-generated code and use that to generate additional code. However, because C++ doesn't have reflection, I am not sure how to do this on the C++ side. Should I use a C++ parser? C++ templates?
TLDR: How to generate C++ code using generated C++ code as input?
Have you considered to take a step back and use the IDL as source for a custom code generator? Probably you have some wrapper code that hides things like duplicate, var, ptr, etc. We have a Ruby based CORBA IDL compiler that currently generates Ruby and C++ code. That could be extended with a customer generator, see https://www.remedy.nl for RIDL and R2CORBA.
Another option would be to check out the IDL to C++11 language mapping, more details on https://www.taox11.org. This new language mapping is much easier to use and uses standard types and STL containers to work with.
GCC XML could help in recovering the interface.
I'm using it to write a Prolog foreign interface for OpenGL and Horde3D rendering engine.
The interfaces I'm interested to are limited to C, but GCC XML handles C++ as well.
GCC XML parse source code interface and emits and XML AST. Then with an XML library it's fairly easy extract requested info. A nuance it's the lose of macro' symbols: AFAIK just the values survive to the parse. As an example, here (part of ) the Prolog code used to generate the FLI:
make_funcs(NameChange, Xml, FileName, Id) :-
index_id(Xml, Indexed),
findall(Name:Returns:ArgTypes,
(xpath(Xml, //'Function'(#file = Id, #name = Name, #returns = ReturnsId), Function),
typeid_indexed(Indexed, ReturnsId, Returns),
findall(Arg:Type, (xpath(Function, //'Argument'(#name = Arg, #type = TypeId), _),
typeid_indexed(Indexed, TypeId, Type)), ArgTypes)
),
AllFuncs),
length(AllFuncs, LAllFuncs),
writeln(FileName:LAllFuncs),
fat('prolog/h3dplfi/~s.cpp', [FileName], Cpp),
open(Cpp, write, Stream),
maplist(\X^((X = K-A -> true ; K = X, A = []), format(Stream, K, A), nl(Stream)),
['#include "swi-uty.h"',
'#include <~#>'-[call(NameChange, FileName)]
]),
forall(member(F, AllFuncs), make_func(Stream, F)),
close(Stream).
xpath (you guess it) it's the SWI-Prolog library that make analysis simpler...
If you want to reliably process C++ source code, you need a program transformation tool that understands C++ syntax and semantics, can parse C++ code, transform the parsed representation, and regenerate valid C++ code (including the original comments). Such a tool provides in effect arbitrary metaprogramming by operating outside the language, so it is not limited by the "reflection" or "metaprogramming" facilities built into the language.
Our DMS Software Reengineering Toolkit with its C++ Front End can do this.
It has been used on a number of C++ automated transformation tasks, both (accidentally) related to CORBA-based activities. The first included reshaping interfaces for a proprietary distributed system into CORBA-compatible facets. The second reshaped a large CORBA-based application in the face of IDL changes; such changes in effect cause the code to be moved around and causes signature changes. You can find technical papers at the web site that describe the first activity; the second was done for a major defense contractor.
Take a look at Clang compiler, aside from being a standalone compiler it is also intended to be used as an library in situations like the one you describe. It will provide you with parse tree on which you could do your analysis and transformations

Getting AST for C++?

I'm looking to get an AST for C++ that I can then parse with an external program. What programs are out there that are good for generating an AST for C++? I don't care what language it is implemented in or the output format (so long as it is readily parseable).
My overall goal is to transform a C++ unit test bed to its corresponding C# wrapper test bed.
You can use clang and especially libclang to parse C++ code. It's a very high quality, hand written library for lexing, parsing and compiling C++ code but it can also generate an AST.
Clang also supports C, Objective-C and Objective-C++. Clang itself is written in C++.
Actually, GCC will emit the AST at any stage in the pipeline that interests you, including the GENERIC and GIMPLE forms. Check out the (plethora of) command-line switches begining with -fdump- — e.g. -fdump-tree-original-raw
This is one of the easier (…) ways to work, as you can use it on arbitrary code; just pass the appropriate CFLAGS or CXXFLAGS into most Makefiles:
make CXXFLAGS=-fdump-tree-original-raw all
… and you get “the works.”
Updated: Saw this neat little graphing system based on GCC AST's while checking my flag name :-) Google FTW.
http://digitocero.com/en/blog/exporting-and-visualizing-gccs-abstract-syntax-tree-ast
Our C++ Front End, built on top of our DMS Software Reengineering Toolkit can parse a variety of C++ dialects (including C++11 and ObjectiveC) and export that AST as an XML document with a command line switch. See example ASTs produced by this front end.
As a practical matter, you will need more than the AST; you can't really do much with C++ (or any other modern language) without an understanding of the meaning and scope of each identifier. For C++, meaning/scope are particularly ugly. The DMS C++ front end handles all of that; it can build full symbol tables associating identifers to explicit C++ types. That information isn't dumpable in XML with a command line switch, but it is "technically easy" to code logic in DMS to walk the symbol table and spit out XML. (there is an option to dump this information, just not in XML format).
I caution you against the idea of manipulating (or even just analyzing) the XML. First, XSLT isn't a particularly good way to understand the meaning of the ASTs, let alone transform the AST, because the ASTs represent context sensitive language structures (that's why you want [nee MUST HAVE] the symbol table). You can read the XML into a dom-like tree if you like and write your own procedural code to manipulate it. But source-to-source transformations are an easier way; you can write your transformations using C++ notation rather than buckets of code goo climbing over a tree data structure.
You'll have another problem: how to generate valid C++ code from the transformed XML. If you don't mind spitting out raw text, you can solve this problem in purely ad hoc ways, at the price of having no gaurantee other than sweat that generated code is syntactically valid. If you want to generate a C++ representation of your final result as an AST, and regenerate valid text from that, you'll need a prettyprinter, which are not technically hard but still a lot of work to build especially for a language as big as C++.
Finally, the reason that tools like DMS exist is to provide the vast amount of infrastructure it takes to process/manipulate complex structure such as C++ ASTs. (parse, analyse, transform, prettyprint). You can try to replicate all this machinery yourself, but this is usually a poor time/cost/productivity tradeoff. The claim is it is best to stay within the tool ecosystem rather than escape it and build bad versions of it yourself. If you haven't done this before, you'll find this out painfully.
FWIW, DMS has been used to carry out massive analysis and transformations on C++ source code. See Publications on DMS and check the papers by Akers on "Re-engineering C++ Component Models".
Clang is based on the same kind of philosophy; there's an ecosystem of tools.
YMMV, but I'd be surprised.

Static analysis for partial C++ programs

I'm thinking about doing some static analysis project over C++ code samples, as opposed to entire programs. In general static analysis requires some simpler intermediate representation, but such a representation cannot be accurately created without the entire program code.
Still, I know there is such a tool for Java - it basically "guesses" missing information and thus allows static analysis to take place even though it's no longer sound or complete.
Is there anything similar that can be used to convert partial C++ code into some intermediate form (e.g. LLVM bytecode)?
As a general rule, if you guess, you guess wrong; any complaints from a static analyzer based on such guesses are false positives and will tend to cause a high rate of rejection.
If you insist on guessing, you'll need a tool that can parse arbitrary C++ fragments. ("Guess a static analysis of this method...."). Most C++ parsers will only parse complete source files, not fragments.
You'll also need a way to build up partial symbol tables. ("I is listed as an argument to FOO, but has no type information, and it is not the same I as as is declared in the statement following the call to FOO").
Our DMS Software Reengineering Toolkit with its C++ Front End can provide parsing of fragments, and might be used as a springboard for partial symbol tables.
DMS provides general parsing/analysis/transformation on code, as determined by an explicit langauge definition provided to DMS. The C++ Front End provides a full, robust C++ front end enabling DMS to parse C++, build ASTs, and build up symbol tables for such ASTs using an Attribute Grammar (AG) in which the C++ lookup rules are encoded. The AG is a functional-style computation encoded over AST nodes; the C++ symbol table builder is essence big functional program whose parts are attached to BNF grammar rules for C++.
As part of the generic parsing machinery, given a langauge definition (such as the C++ front end), DMS can parse arbitrary (non)terminals of that language using its built-in pattern langauge. So DMS can parse expressions, methods, declarations, etc. or any other well-formed code fragment and build ASTs. Where a non-wellformed fragment is provided, one currently gets a syntax error on the fragment parse; it would be possible to extend DMS's error recovery to generate a plausabile AST fix and thus parse arbitrary elements.
The partial symbol table is harder, since much of the symbol table building machinery depends on other parts of the symbol table being built. However, since this is all coded as an AG, one could run the part of the AG relevant to the fragment parsed, e.g., the symobl table building logic for a method. The AG would need to be modified probably extensively to allow it to operate with "assumptions" about missing symbol definitions; these would in effect become constraints. Of course, a missing symbol might be any of several things, and you might end up with configurations of possible symbol tables. Consider:
{ int X;
T*X;
}
Not knowing what T is, the type of the phrase (and even its syntactic category) can't be uniquely determined. (DMS will parse the T*X; and report an ambiguous parse since there are multiple possible matching interpretations, see Why can't C++ be parsed with a LR(1) parser?)
We've already done some work this partial parsing and partial symbol tables, in which we used DMS experimentally to capture code containing preprocessor conditionals, with some conditional status undefined. This causes us to build conditional symbol table entries. Consider:
#if foo
int X;
#else
void X(int a) {...}
#endif
...
#if foo
X++;
#else
X(7);
#endif
With conditional symbols, this code can type check. The symbol table entry for X says something like, "X ==> int if foo else ==> void(int)".
I think the idea of reasoning about large program fragments with constraints is great, but I suspect it is really hard, and you'll forever being trying to resolve enough information about a constraint into order to do static analysis.
Understand 4 C++ by SciTools is a product that parses source code, and provides metrics for various things. As a tool the product is sort of like a source code browser, But I personally don't use it for that since visual studio's Intellisense is just as good.
Its real power is that it comes with a C and Perl API. Thus using that you can write your own static analysis tools. And yes, it will deal quite well with missing code files. Also, understand 4 C++ works on Windows and a bunch of other operating systems.
As to your last question about intermediate code, Understand 4 C++ doesn't provide you with an 'intermediate' form, but with its API, it does provide you with an abstraction layer over the abstract syntax tree that gives you a lot of power to analyze source code. I have written a lot of tools at my work using this API, and a managed C++ API (which I wrote and shared publicly on codeplex) that wraps its native C API.
dont know about LLVM bytecode, but there is an old adage called PcLint
http://www.gimpel.com/html/index.htm
they even have an online testing module, where you can post portions of code

Open-source C++ scanning library

Rationale: In my day-to-day C++ code development, I frequently need to
answer basic questions such as who calls what in a very large C++ code
base that is frequently changing. But, I also need to have some
automated way to exactly identify what the code is doing around a
particular area of code. "grep" tools such as Cscope are useful (and
I use them heavily already), but are not C++-language-aware: They
don't give any way to identify the types and kinds of lexical
environment of a given use of a type or function a such way that is
conducive to automation (even if said automation is limited to
"read-only" operations such as code browsing and navigation, but I'm
asking for much more than that below).
Question: Does there exist already an open-source C/C++-based library
(native, not managed, not Microsoft- or Linux-specific) that can
statically scan or analyze a large tree of C++ code, and can produce
result sets that answer detailed questions such as:
What functions are called by some supplied function?
What functions make use of this supplied type?
Ditto the above questions if C++ classes or class templates are involved.
The result set should provide some sort of "handle". I should be able
to feed that handle back to the library to perform the following types
of introspection:
What is the byte offset into the file where the reference was made?
What is the reference into the abstract syntax tree (AST) of that
reference, so that I can inspect surrounding code constructs? And
each AST entity would also have file path, byte-offset, and
type-info data associated with it, so that I could recursively walk
up the graph of callers or referrers to do useful operations.
The answer should meet the following requirements:
API: The API exposed must be one of the following:
C or C++ and probably is "C handle" or C++-class-instance-based
(and if it is, must be generic C o C++ code and not Microsoft- or
Linux-specific code constructs unless it is to meet specifics of
the given platform), or
Command-line standard input and standard output based.
C++ aware: Is not limited to C code, but understands C++ language
constructs in minute detail including awareness of inter-class
inheritance relationships and C++ templates.
Fast: Should scan large code bases significantly faster than
compiling the entire code base from scratch. This probably needs to
be relaxed, but only if Incremental result retrieval and Resilient
to small code changes requirements are fully met below.
Provide Result counts: I should be able to ask "How many results
would you provide to some request (and no don't send me all of the
results)?" that responds on the order of less than 3 seconds versus
having to retrieve all results for any given question. If it takes
too long to get that answer, then wastes development time. This is
coupled with the next requirement.
Incremental result retrieval: I should be able to then ask "Give me
just the next N results of this request", and then a handle to the
result set so that I can ask the question repeatedly, thus
incrementally pulling out the results in stages. This means I
should not have to wait for the entire result set before seeing
some subset of all of the results. And that I can cancel the
operation safely if I have seen enough results. Reason: I need to
answer the question: "What is the build or development impact of
changing some particular function signature?"
Resilient to small code changes: If I change a header or source
file, I should not have to wait for the entire code base to be
rescanned, but only that header or source file
rescanned. Rescanning should be quick. E.g., don't do what cscope
requires you to do, which is to rescan the entire code base for
small changes. It is understood that if you change a header, then
scanning can take longer since other files that include that header
would have to be rescanned.
IDE Agnostic: Is text editor agnostic (don't make me use a specific
text editor; I've made my choice already, thank you!)
Platform Agnostic: Is platform-agnostic (don't make me only use it
on Linux or only on Windows, as I have to use both of those
platforms in my daily grind, but I need the tool to be useful on
both as I have code sandboxes on both platforms).
Non-binary: Should not cost me anything other than time to download
and compile the library and all of its dependencies.
Not trial-ware.
Actively Supported: It is likely that sending help requests to mailing lists
or associated forums is likely to get a response in less than 2
days.
Network agnostic: Databases the library builds should be able to be used directly on
a network from 32-bit and 64-bit systems, both Linux and Windows
interchangeably, at the same time, and do not embed hardcoded paths
to filesystems that would otherwise "root" the database to a
particular network.
Build environment agnostic: Does not require intimate knowledge of my build environment, with
the notable exception of possibly requiring knowledge of compiler
supplied CPP macro definitions (e.g. -Dmacro=value).
I would say that CLang Index is a close fit. However I don't think that it stores data in a database.
Anyway the CLang framework offer what you actually need to build a tool tailored to your needs, if only because of its C, C++ and Objective-C parsing / indexing capabitilies. And since it's provided as a set of reusable libraries... it was crafted for being developed on!
I have to admit that I haven't used either because I work with a lot of Microsoft-specific code that uses Microsoft compiler extensions that i don't expect them to understand, but the two open source analyzers I'm aware of are Mozilla Pork and the Clang Analyzer.
If you are looking for results of code analysis (metrics, graphs, ...) why not use a tool (instead of API) to do that? If you can, I suggest you to take a look at Understand.
It's not free (there's a trial version) but I found it very useful.
Maybe Doxygen with GraphViz could be the answer of some of your constraints but not all,for example the analysis of Doxygen is not incremental.