Related
I am almost certain this question has been asked before, but I can not seem to find the right keywords to search for to get an answer. My apologies if this is a duplicate.
I am better trying to understand the compilation process of say a C++ file as it goes from the C++ syntax to the binary machine code. In addition I am trying to understand what influences the resulting machine code.
First, I am nearly certain that the following are the only factors (for most systems) that dictate the final machine code (please correct me if I am wrong here)
The tools used to compile, assemble, and link.
Things like gnu c compiler, clang, visual studio, nasm, ect.
The kernel of the system being used.
Whether its a specific version of the linux kernel, windows microkernel, or some other kernel like a mac os x one.
The operating system being used.
This one I am less clear about. I am unsure if machines running the same linux kernel, but different os, in this case let's say debian vs centos, will they produce different binaries.
Lastly the hardware architecture.
Different cpu architectures like arm 64, x86, power pc, ect. take different op codes so obviously the machine code should be different.
So with that being said here is my understanding of the compilation process and where each of these dependencies show up.
I write a C++ file and use code that my system can understand. A good example might be using <winsock.h> on windows and <sys/socket.h> on linux.
The preprocessor runs and executes any preprocessor macros.
Here I know that different preprocessors will define different macros but for now I will assume this is not too machine dependent. (This might be wrong to assume).
The compiler tools run to produce assembly file outputs.
Here the assembly produced depends on the compiler and what optimizations or choices it makes.
It also depends on the kernel because different kernels have different system calls and store files in different locations. This means the assembly might make changes such as different branching when calling functions specific to that kernel.
The operating system? Still unsure how the operating system fits in to this. If two machines have the same kernel, what does the operating system do to the binaries?
Finally the assembly code depends on the cpu architecture. I think that is a pretty obvious statement.
Once the compiler produces an assembly. We can then invoke the assembler to turn our assembly code into almost complete machine code. (I think machine code is identical to binary opcodes a cpu manual lists but this might be wrong).
The corresponding machine code files (often called object files I think) contain nearly all the instructions needed to run or reference other machine code files which will be linked in the next step.
This machine code usually has some format (I think ELF is a popular format for linux) and this format is dependent on the linker for sure.
I don't think the kernel, operating system, or hardware affect the layout/format of the object file but this is probably wrong. If they do please correct this.
The hardware will affect the actual machine code produced because again I think it is a 1 to 1 mapping of machine code instructions to opcodes for a cpu.
I am unsure if the kernel or operating system affect the linking process because I thought their changes were already incorporated in the compiling step.
Finally the linking step occurs.
I think this is as simple as the linker looking for all the referenced machine code and injecting it into one complete machine code file which can be executed.
I have no clue what affects this besides the linker tool itself.
So with all that, I need help identifying inaccuracies with the procedure I described above, and any dependencies I might have missed whether it be cpu, os, kernel, or tool ones.
Thank you and sorry for the long winded question. This probably should have been broken up into multiple questions but I am too far in. If this does not go well I may ask each part in individual questions.
EDIT:
Questions with more focus.
What components of a machine affect the machine code produced given a C++ file input?
Actually that is a lot of questions and usually you're question would be much too broad for SO (as you managed to recognize by yourself). But on the other hand you showed a deep interest (just by writing such a long and profound question) and also a lot of correct understanding of the process of compiling a program. The things you are missing or not understanding correctly (and you are probably the most interested in) are those things, that I myself found hard to learn. Thus I will provide you with some important points, that I think you are missing in the big picture.
Note that I am very much used to Linux, so I will mostly describe how things work on Linux. But I believe that most things also happen in a similar way on other operating systems.
Let's begin with the hardware. A modern computer has a CPU of some architecture. There are lots of different of CPU architectures. You mentioned some of them like arm, x86, etc. which are families of similar CPUs and can be divided into smaller groups by bit width and/or supported extensions. Ultimately your processor has a specified instruction set that defines which opcodes it supports and what those opcodes do. If a native (compiled) program runs, there are raw opcodes in the memory and the CPU directly executes them following its architecture specification.
Aside from the CPU there is a lot more hardware connected to your computer. Usually communicating with this hardware is complicated and not standardized. If a user program for example gets input keystrokes from the keyboard, in does not have to directly communicate with the keyboard, but rather does this via the operating system kernel. This works by a mechanism called syscall interrupt. The kernel installs an handler routine, that is called if a user program triggers such an interrupt with a special CPU instruction. You can think of it like a language agnostic function call from the program into the kernel. For example for Linux you can find a list of all syscalls at the syscall(2) man page. The syscalls form the kernel's Application Binary Interface (kernel ABI). Reading and writing from a terminal or using a filesystem are examples for syscall functionality.
As you can see, there are already very high level functions, that are implemented in the kernel. However the functionality is still quite limited for most typical applications. To encapsulate the syscalls and provide functions for memory management, utility functions, mathematical functions and many other things you probably use in your daily programs, there is usually another layer between the program and the kernel. This thing is called the C standard library, and it is a shared library (we will cover what exactly this is in a moment). On GNU/Linux it is the glibc which is the single most important library on a GNU/Linux system (and notably not part of the kernel 1). While it implements all the features that are required by the C standard (for example functions like malloc() or strcpy()), it also ships a lot of additional functions which are a superset of the ISO C standard library, the POSIX standard and some extensions. This interface is usually called the Application Programming Interface (API) of the operating system. While it is in principle possible to bypass the API and directly use the syscalls, almost all programs (even when written in other languages than C or C++) use the C library.
Now get yourself a coffee and a few minutes of rest. We now have enough background information to look at how a C++ program is transformed into a binary, and how exactly this binary is executed.
A C++ program consists of different compilation units (usually each different source file is a compilation unit). Each compilation unit undergoes the following steps
The preprocessor is run on the file. It includes header, expands macros and does some other stuff. As you wrote in your question this is rather platform independent. The preprocessor actions are standardized in the C++ standard.
The resulting code is compiled. That means C++ code is translated into assembly code. Because assembly code directly reflects the CPU instructions, this step is dependent on the target CPU architecture, that the compiler was configured for (usually the host CPU). The compiler is allowed to optimize and translate the program in any way it wants, as long as it follows the as-if rule. Thus this step is also higly dependent on the compiler you are using.
Note: Symbols (especially functions) that are not defined, are left undefined. If you say call the malloc() function, this will not be compiled, but left unevaluated until later. Thus this step is also not much dependent on the operating system.
Assembling takes place. This is very straightforward. The assembly code usually can be converted directly into binary CPU instructions. Local symbols (such as goto labels etc.) are resolved and replaced by their corresponding addresses. Unknown external symbols such as the mentioned malloc() call still are left unevaluated and are stored in the object file's symbol table. Because most of the syscalls are wrapped in library functions, the assembly code will usually not directly contain syscall code. Thus this step is depended on the CPU architecture. It is however dependent on the ABI2, which in term is dependent on the compiler and the OS.
Linking takes place. The different compilation units are combined into a single executable binary in an OS-dependent format (e.g. GNU/Linux uses ELF). Here yet more symbols are resolved. For example if one compilation calls a function in another compilation unit, this call is resolved and the symbol is replaced by the function address. If you link to a library statically, this is just treated like another compilation unit and included into the executable with its symbols resolved.
Shared libraries are checked for the needed symbols, but not linked yet. For example in case of the malloc() call, the linker checks, that there is a malloc symbol in the glibc, but the symbol in the executable still remains unresolved.
At this point you have a executable binary. As you might noticed, there might still be unresolved symbols in that binary. Thus you cannot just load that binary into RAM and let the CPU execute it. A final step called dynamic linking is needed. On Linux the program that performs this step is called the dynamic linker/loader. Its task is to load the executable ELF file into memory, look up all the needed dynamic libraries, load them into memory as well (a list is stored in the ELF file) and resolve the remaining symbols. This last step happens each time the program is executed. Now finally the malloc() symbol is resolved with the address in the glibc shared library.
You have pure CPU instructions in memory, the CPU's program counter register (the one that tracks the next instruction) is set to the entry point, and the program can begin to run. Every now and then it is interrupted either because it makes a syscall, or because it is interrupted by the kernel scheduler to let another program run on that CPU core.
I hope I could answer some of your questions and satisfy your curiosity. I think the most important part you were missing, was how dynamic linking happens. This is a very interesting topic which is related to concepts like position independent code. I wish you could luck learning.
1 this is also one reason why some people insist on calling Linux based systems GNU/Linux. The glibc library (together with many other GNU programs) defines much of the operating system structure, interacts with supplementary programs and configuration files etc. There are however Linux based systems without glibc. One of them is Android, using Googles bionic libc.
2 The ABI is related to the calling convention. This is a mixture of operating system, programming language and compiler specification. It is one of the reasons (besides name mangling, see the comment of PeterCordes below) you need those extern "C" {...} scopes in C++ header files, that declare C functions in shared libraries. It basically is a convention on how to pass parameters and return values between functions.
Neither operating system nor kernel are directly involved in any of this.
Their limited involvement is in that if you want to build Linux 64 bit binaries for x86 using gnu tools then you need to in some way (download and install or build yourself) build the gnu tools themselves for that target processor and that operating system. As system calls are specific to the operating system and target, and also the binaries supported by that operating system. Not strictly just the elf file format, that is just a container, but the linking and possibly bootstrap is also specific to the operating systems loader. (or if building something for the kernel that would have other rules). For example, does the application loader initialize .bss and .data for you from specific information in the .elf file, or like on an mcu does the bootstrap code itself have to do this?
The builder for gnu tools for a target like linux and ideally a pre-built binary for your os and target, would have paths setup in some way. The c library would have a default linker script and its intimate partner the bootstrap.
After that point, it is just a dumb toolchain. Include files be they at the system level, compiler level, or programmer level are just includes in the C language. The default paths and gcc knows where it was executed from so it knows where in a normal build the gcc and other libraries live.
gcc itself is not a compiler actually it calls other programs like the preprocessor, the compiler itself, the assembler and linker.
The preprocessor is going to do the search and replace for includes and defines and end up with one great big cpp file, then pass that to the compiler.
The compiler front end (C++ language for gcc for example) turns that into an internal language, allocate an int with this name, and another add the two and blah. A pseudo code if you will. This gets a lot of the optimization work done on it then eventually the back end (which for gnu could be x86, mips, arm, etc independent to some extent of the front and middle). The LLVM tools, are at least capable of exposing that middle, internal, language to external files (external to the memory used by the compiler to do the compilation) and you can combine and optimize those bytecode files and then convert them to assembly or direct to object in the llvm world. I think this is an exception not a rule, others just use internal tables.
While I think it is wise and sane to use an assembly language step. Not all compilers do and do not assume that all compilers do. Some output objects.
Yes that assembly is naturally partial, external functions (labels) and variables (labels) cannot be resolved at the object level. The linker has to do that.
So the target (x86, arm, etc) does affect the construction of the elf file as
there are certain items, magic numbers specific to the target. As mentioned the operating system and or kernel do affect the elf in that there are rules for construction of the binary for that kernel or operating system. Remember that elf is just a container like tar or zip or mkv etc. Do not assume that the operating system can handle every possible choice you want to make with the contents that the linker will allow (the tools are dumb, do what they are told).
So your source.
All the relevant sources that go with it including system includes, compiler includes and your includes.
gcc/g++ is a wrapper program that manages the steps.
calls the pre-processor expands includes and defines into one file (no magic here)
call the compiler to parse that one file into internal tables, think pseudo code and data
many, many possible optimizers that operate on these structures
backend, including peephole optimizer, turns the tables into assembly language (for gnu at least)
assembler is called to turn the asm into an object
If all the objects are specified and gcc is told to link, then...
Linker combines all the objects for the binary, including the bootstrap, including already built libraries, stubs, etc, and command line or more likely a linker script (linker script and bootstrap have an intimate relationship they are not assumed to be separable and not part of the compiler they are part of a C library, etc).
Kernel module loader or operating system application loader fed the file and per the rules of that loader loads and runs the program.
with reference to : http://www.cplusplus.com/articles/2v07M4Gy/
During the compilation phase,
This phase translates the program into a low level assembly level code. The compiler takes the preprocessed file ( without any directives) and generates an object file containing assembly level code. Now, the object file created is in the binary form. In the object file created, each line describes one low level machine level instruction.
Now, if I am correct then different CPU architectures works on different assembly languages/syntax.
My question is how does the compiler comes to know to which assembly language syntax the source code has to be changed? In other words, how does the C++ compiler know which CPU architecture is there in the machine it is working on ?
Is there any mapping used by assembler w.r.t the CPU architecture for generating assembly code for different CPU architectures?
N.S : I am beginner !!
Each compiler needs to be "ported" to the given system. For each system supported, a "compiler port" needs to be programmed by someone who knows the system in-depth.
WARNING : This is extremely simplified
In short, there are three main parts to a compiler :
"Front-end" : This part reads the language (in this case c++) and converts it to a sort of pseudo-code specific to the compiler. (An Abstract Syntactic Tree, or AST)
"Optimizer/Middle-end" : This part takes the AST and make a non-architecture-dependant optimized one.
"Back-end" : This part takes the AST, and converts it to binary executable code, specific to the architecture you want to compile your language on.
When you download a c++ compiler for your platform, you, in fact, download the c++ frontend with the linux-amd64 backend, for example.
This coding architecture is extremely helpful, because it allows to port the compiler for another architecture without rewriting the whole parsing/optimizing thing. It also allows someone to create another optimizer, or even another frontend supporting a whole different language, and, as long as it outputs a correct AST, it will be compatible with every single backend ever written for this compiler.
Simply put, the knowledge of the target system is coded into the compiler.
So you might have a C compiler that generates SPARC binaries, and a C compiler that generates VAX binaries. They both accept the same input language (as defined in the C standard), but produce different programs from it.
Often we just refer to "the compiler", meaning the one that will generate binaries for our current environment.
In modern times, the distinction has become less obvious with compiler collections such as GCC. Now the "different compilers" are often the same compiler program, just set up with different configurations (these are the "target description files").
Just to complete the answers given here:
The target architecture is indeed coded into the specific compiler instance you're using. This is important also for a process called "cross-compiling" - The process of compiling on a certain system an executable that would operate on another system/architecture.
Consider working on an embedded system-on-chip that uses a completely different instruction set than your own - You're working on an x86/64 Linux system, but need to compile a mobile app running on an ARM micro-processor, or some other type of assembly architecture.
It would be unreasonable to compile your code on the target system, which might be so limited in CPU and memory that it can't feasibly run a compiler - and so you can use a GCC (or any other compiler) port for that target architecture on your favorite system.
It's also quite critical to remember that the entire tool-chain is often compatible to the target system, for instance when shared libraries such as libc are getting in play - as the target OS could be a different release of Linux and would have different versions of common functions - In which case it's common to use tool-chains that contain all the necessary libraries and use something like chroot or mock to compile in the "target environment" from within your system.
OK, so: I have successfully integrated the first working Halide generator into the cmake build system for my little image-processing project.
The generator implements an image-resizing and -resampling algorithm, based on the example code from the Halide codebase – Halide/apps/resize/resize.cpp – I adapted the sample in order to leverage generator parameters, and tied the generators’ compilation and invocation to my cmake script using the functions defined in HalideGenerator.cmake, just as the Halide project does in its own build script.
All this works great, so far – but my domain expertise is lacking in the realm of code-generation nuances. For example, I tweaked the scheduling method to get the best observed empirical speed on my laptop – but despite taking many long tinkering sessions and code-reading sojourns into the depths of Halide’s many generator-related tools and scripts, I have only the most superficial understanding of the code-generation process.
Specifically, I don’t know how to approach this. Is it best to use defaults or try to turn on specific options for my target platform – and if the latter, do I have to have conditional code somewhere, or can the binary include fallbacks?
Here’s what I am talking about: in the source for Halide tutorial lesson #15, there’s a complex script that invokes a generator with various options. Here’s a snippet from code comments in this script:
# If you're compiling and linking multiple Halide pipelines, then the
# multiple copies of the runtime should combine into a single copy
# (via weak linkage). If you're compiling and linking for multiple
# different targets (e.g. avx and non-avx), then the runtimes might be
# different, and you can't control which copy of the runtime the
# linker selects.
# You can control this behavior explicitly by compiling your pipelines
# with the no_runtime target flag. Let's generate and link several
# different versions of the first pipeline for different x86 variants: [snip]
… from this it is hard to separate what must be done, from what should be done, or what may be done, discretionally. Comparatively, one doesn’t have to deal with these issues when setting up C++ or Objective-C projects (even more Byzantine examples) as the compiler and linker make most these decisions for you, and at most need a flag or two.
My question is: how can I integrate the Halide generator’s output library binaries into my existing project – such that the generator output is as fast as possible (e.g. uses GPU, SSE2/3, AVX2 etc) without further constraining portability (e.g. it won’t mysteriously segfault on a slightly different machine)?
Specifically, what should my process be – as in, should I only target the lowest-common-denominator at first, and then leverage more exotic processor features incrementally?
EDIT: As I mentioned in comments below, this is what my GenGen binary outputs to stdout when invoked with no options:
As of recently, AOT-compilation now supports generating customizations for multiple CPU features with runtime detection. Just use GenGen with a comma-separated list of targets and static_library as the output, e.g.
GenGen -e static_library,h target=x86-64-linux-sse41-avx,x86-64-linux-sse41,x86-64-linux
This will produce a .a file that contains:
3 versions of your code, compiled with specializations for AVX+SSE4.1, SSE4.1, and plain-old-x86-64
a thin wrapper that uses the halide_can_use_target_features() runtime call to choose the right one for the actual target machine
See Func::compile_to_multitarget_static_library() and multitarget_generator.cpp/multitarget_aottest.cpp for more information.
For the case of pre-generating your binaries (AOT), it sounds like you want runtime dispatch. Your program will examine the CPU/GPU environment at startup and decide what features (AVX, OpenCL, etc.) should be used. This is not Halide specific.
Select a set of advanced features to target (high powered desktop GPU) as a best case and a set of minimal features that will work on every machine (SSE2 only).
Build a DLL/dylib/so for each of these feature sets that contains every performance hungry function. These can be scheduled differently or even built with completely different Func definitions. You can have both sets in the same source code file and test the Target object at generation time to choose between them.
At program startup, see if your best case features are present and, if so, load that library and use it. If any features are missing, default to the most compatible version.
You are free to choose how many feature sets and libraries you want to support.
The alternative is to compile your Halide code at program startup (JIT). I recommend using the Target object returned by get_jit_target_from_environment(), which uses the contents of the environment variable HL_JIT_TARGET or "host" if that variable is not set. The "host" target string is the same as get_host_target() and means Halide will examine the CPU/GPU environment and set whatever features it finds. You can then dynamically test the Target object and use GPU or CPU scheduling.
I'm developing a new language in LLVM using the C++ API which compiles down to target the C ABI.
I would like to support modular compilation by allowing end users to build what are effectively static libraries. I noticed the LLVM C++ API has a llvm::Linker class that I can use during compilation to combine source files (llvm::Module), however I wanted to guarantee library compatibility via metadata version numbers or at least the publicly exposed interface between separate compilation runs.
Much of the information available on metadata in LLVM suggest that it should only be used for extended information that would not break correctness when silently removed.
llvm
blog
IntrinsicsMetadataAttributes
pdf
I wouldn't think this would be a deal breaker as it could be global metadata, but it would be good to get a second opinion on that point.
I also know there is a method in IRReader to parseIRFile so I can load some previously built bc files. I would be curious if it would be reasonable practice to include size and CRC information for comparison when loading these files.
My language has concepts similar to C# including interfaces. I figure I could allow modular compilation by importing/exporting an interface type along with external functions (Much like C++, I don't restrict the language to only methods of classes).
This approach allows me to include language specific information in the interface without needing to encode it in the IR as both the library and the calling code would be required to build with the interface. This again requires the interfaces to be compatible.
One language feature that would require extended information would be named parameters in functions.
My language is very type-safe and also mandates named parameters so there is no predetermined function parameter order. This allows call sites to be more explicit, the compiler to catch erroneous parameter usage, and authors have more liberty in determining default parameters as they are not restricted to the last parameters to the function.
The compiler will need to know names, modifiers, defaults, etc. of these parameters to correctly map calls at compile time, so I figure the interface approach would work well here.
TL;DR
Does LLVM have any predefined facilities for building static libraries?
Is version number, size, and CRC information reasonable use cases for LLVM's metadata?
This is probably not QUITE an answer... Or at least not a complete answer.
I like this question, as I'm going to need a solution in the future too (some time in the next few months or years) for my Pascal compiler. It supports "units" which is meant to be a separately compiled object, but currently what I do is simply drag in the source file and compile it into the main llvm::Module - that's neither efficient nor flexible (can't use the linker to choose between the "Linux" and "Windows" version of some code, for example - not that I think there is 5% chance that my compiler will work on Windows without modification anyway...)
However, I'm not sure storing the "object" file as LLVM IR would be the right thing to do. I was thinking that a better way would be to store your AST in some serialized form - then
you don't depend on LLVM versions changing the IR format.
You can add whatever metadata you like. There won't be much
difference in generating LLVM-IR from this during your link phase or
building the IR at compile and then reading the IR to figure out if
the metadata is correct. [The slow part, as you may have already found out, is the optimisation and MC generation, and you'd still have to do that either way]
Like I started out, I'm not sure this is an answer, but it's my thoughts so far on the subject. Now I'll go back to adding debug symbol stuff to my Pascal compiler... Before Christmas, I couldn't see the source in GDB. Now I can step, but no viewing of variables yet...
Rationale: In my day-to-day C++ code development, I frequently need to
answer basic questions such as who calls what in a very large C++ code
base that is frequently changing. But, I also need to have some
automated way to exactly identify what the code is doing around a
particular area of code. "grep" tools such as Cscope are useful (and
I use them heavily already), but are not C++-language-aware: They
don't give any way to identify the types and kinds of lexical
environment of a given use of a type or function a such way that is
conducive to automation (even if said automation is limited to
"read-only" operations such as code browsing and navigation, but I'm
asking for much more than that below).
Question: Does there exist already an open-source C/C++-based library
(native, not managed, not Microsoft- or Linux-specific) that can
statically scan or analyze a large tree of C++ code, and can produce
result sets that answer detailed questions such as:
What functions are called by some supplied function?
What functions make use of this supplied type?
Ditto the above questions if C++ classes or class templates are involved.
The result set should provide some sort of "handle". I should be able
to feed that handle back to the library to perform the following types
of introspection:
What is the byte offset into the file where the reference was made?
What is the reference into the abstract syntax tree (AST) of that
reference, so that I can inspect surrounding code constructs? And
each AST entity would also have file path, byte-offset, and
type-info data associated with it, so that I could recursively walk
up the graph of callers or referrers to do useful operations.
The answer should meet the following requirements:
API: The API exposed must be one of the following:
C or C++ and probably is "C handle" or C++-class-instance-based
(and if it is, must be generic C o C++ code and not Microsoft- or
Linux-specific code constructs unless it is to meet specifics of
the given platform), or
Command-line standard input and standard output based.
C++ aware: Is not limited to C code, but understands C++ language
constructs in minute detail including awareness of inter-class
inheritance relationships and C++ templates.
Fast: Should scan large code bases significantly faster than
compiling the entire code base from scratch. This probably needs to
be relaxed, but only if Incremental result retrieval and Resilient
to small code changes requirements are fully met below.
Provide Result counts: I should be able to ask "How many results
would you provide to some request (and no don't send me all of the
results)?" that responds on the order of less than 3 seconds versus
having to retrieve all results for any given question. If it takes
too long to get that answer, then wastes development time. This is
coupled with the next requirement.
Incremental result retrieval: I should be able to then ask "Give me
just the next N results of this request", and then a handle to the
result set so that I can ask the question repeatedly, thus
incrementally pulling out the results in stages. This means I
should not have to wait for the entire result set before seeing
some subset of all of the results. And that I can cancel the
operation safely if I have seen enough results. Reason: I need to
answer the question: "What is the build or development impact of
changing some particular function signature?"
Resilient to small code changes: If I change a header or source
file, I should not have to wait for the entire code base to be
rescanned, but only that header or source file
rescanned. Rescanning should be quick. E.g., don't do what cscope
requires you to do, which is to rescan the entire code base for
small changes. It is understood that if you change a header, then
scanning can take longer since other files that include that header
would have to be rescanned.
IDE Agnostic: Is text editor agnostic (don't make me use a specific
text editor; I've made my choice already, thank you!)
Platform Agnostic: Is platform-agnostic (don't make me only use it
on Linux or only on Windows, as I have to use both of those
platforms in my daily grind, but I need the tool to be useful on
both as I have code sandboxes on both platforms).
Non-binary: Should not cost me anything other than time to download
and compile the library and all of its dependencies.
Not trial-ware.
Actively Supported: It is likely that sending help requests to mailing lists
or associated forums is likely to get a response in less than 2
days.
Network agnostic: Databases the library builds should be able to be used directly on
a network from 32-bit and 64-bit systems, both Linux and Windows
interchangeably, at the same time, and do not embed hardcoded paths
to filesystems that would otherwise "root" the database to a
particular network.
Build environment agnostic: Does not require intimate knowledge of my build environment, with
the notable exception of possibly requiring knowledge of compiler
supplied CPP macro definitions (e.g. -Dmacro=value).
I would say that CLang Index is a close fit. However I don't think that it stores data in a database.
Anyway the CLang framework offer what you actually need to build a tool tailored to your needs, if only because of its C, C++ and Objective-C parsing / indexing capabitilies. And since it's provided as a set of reusable libraries... it was crafted for being developed on!
I have to admit that I haven't used either because I work with a lot of Microsoft-specific code that uses Microsoft compiler extensions that i don't expect them to understand, but the two open source analyzers I'm aware of are Mozilla Pork and the Clang Analyzer.
If you are looking for results of code analysis (metrics, graphs, ...) why not use a tool (instead of API) to do that? If you can, I suggest you to take a look at Understand.
It's not free (there's a trial version) but I found it very useful.
Maybe Doxygen with GraphViz could be the answer of some of your constraints but not all,for example the analysis of Doxygen is not incremental.