building c++ from header-include information - c++

With Haskell I can "ghc --make Main.hs" and with Ada I can just "gnatmake Main.adb" and that is it.
Isn't there anything like that for C++? Why not?
I do not want to write buildscripts nor makefiles for C++ projects. I have those damn #include lines there. Why isn't that information enough?
note: I vaguely remember a feature like that mentioned once in the context of Clang.
update:
It seems possible to have a C++ compiler (or write a wrapper script), that recursively looks for included headers and expects either sourcefile or objectfile to be in the same dir; compiles and links everything automatically. Skipping if source and object file have same timestamp. Link-time-decisions are left as a special case necessiating a compiler-flag/switch to select one from multiple source/object-files for the single header, or specify dynamic linking. E.g.: awesomecompiler Main.cpp --link-choice=DrawStuff.h-->DrawStuffGL.o.
Hence there must be another reason for using make or its alternatives. What is it?
To rephrase the question as suggested by martin:
Why can't we just get all the build-information from the header files, and a few commandline flags for special cases?

Some languages have a system whereby the "main" file is specifying everything else that makes up that program as "modules" or some such. Ada certainly does, I don't know enough Haskell to comment there.
C and C++ rely on modules being compiled separately and linked at the end, and the software developer decides exactly what the process is here. This has some advantages, such as that you can build a module for one solution, and a different module for another solution. This is not possible if all modules are specified by the source file (you then have to make the files appear/disappear in the filesystem instead, which of course means some other "work outside the compiler", so you end up with a makefile or some such anyway).
Say for example, we make a game, and we encapsulate all the drawing, then we can choose whether we use DirectX9, DirectX10, OpenGL or OpenGLES by simply linking with the relevant "DrawStuffDX9.o" or "DrawStuffGL.o" etc.
As always, freedom means more choice, but also a bit more work. Just like buying a ready made piece of furniture is simple, but if you want it to fit exactly to your house, floor to ceiling, you have to be lucky. A bespoke piece of furniture will cost more and require some detailed measurements, but will be a perfect fit for your home.
[gcc -MM somefile(s) will give you a rudimentary makefile for the source file(s) you specified].

Related

Config file location and binaries and build systems like autoconf

Most build systems, like autoconf/automake, allow the user to specify a target directory to install the various files needed to run a program. Usually this includes binaries, configuration files, auxilliary scripts, etc.
At the same time, many executables often need to read from a configuration file in order to allow a user to modify runtime settings.
Ultimately, a program (let's say, a compiled C or C++ program) needs to know where to look to read in a configuration file. A lot of times I will just hardcode the path as something like /etc/MYPROGAM/myprog.conf, which of course is not a great idea.
But in the autoconf world, the user might specify an install prefix, meaning that the C/C++ code needs to somehow be aware of this.
One solution would be to specify a C header file with a .in prefix, which simply is used to define the location of the config file, like:
const char* config_file_path = "#CONFIG_FILE_PATH#"; // `CONFIG_FILE_PATH` is defined in `configure.ac`.
This file would be named something like constants.h.in and it would have to be process by the configure.ac file to output an actual header file, which could then be included by whatever .c or .cpp files need it.
Is that the usual way this sort of thing is handled? It seems a bit cumbersome, so I wonder if there is a better solution.
There are basically two choices for how to handle this.
One choice is to do what you've mentioned -- compile the relevant paths into the resulting executable or library. Here it's worth noting that if files are installed in different sub-parts of the prefix, then each such thing needs its own compile-time path. That's because the user might specify --prefix separately from --bindir, separately from --libexecdir, etc. Another wrinkle here is that if there are multiple installed programs that refer to each other, then this process probably should take into account the program name transform (see docs on --program-transform-name and friends).
That's all if you want full generality of course.
The other approach is to have the program be relocatable at runtime. Many GNU projects (at least gdb and gcc) take this approach. The idea here is for the program to attempt to locate its data in the filesystem at runtime. In the projects I'm most familiar with, this is done with the libiberty function make_relative_prefix; but I'm sure there are other ways.
This approach is often touted as being nicer because it allows the program's install tree to be tared up and delivered to users; but in the days of distros it seems to me that it isn't as useful as it once was. I think the primary drawback of this approach is that it makes it very hard, if not impossible, to support both relocation and the full suite of configure install-time options.
Which one you pick depends, I think, on what your users want.
Also, to answer the above comment: I think changing the prefix between configure- and build time is not really supported, though it may work with some packages. Instead the usual way to handle this is either to require the choice at configure time, or to supported the somewhat more limited DESTDIR feature.

clang libTooling: How to find which header an AST item came out of?

Examples found on the web for clang tools are always run on toy examples, which are usually all really trivial C programs.
I am building a tool which performs source-to-source transformations on C++ code, which is obviously a very, very challenging task, but clang is up to this task.
The issue I am facing now is that the AST that clang generates for any C++ code that utilizes the STL is enormous. For example I have some C++ code for which clang++ -ast-dump ... | wc -l is 67,018 lines of horrifying AST gobbledygook!
99% of this is standard library stuff, which I aim to ignore in my source-to-source metaprogramming task. So, to achieve this I want to simply filter out files. Suppose I want to look at only the class definitions in the headers of the project that I'm analyzing (and ignore all standard library headers's stuff), I will need to just figure out which header each of my CXXRecordDecl's came from!
Can this be done?
Edit: Hopefully this is a way to go about it. Trying this out now... The important bit is that it has to tell me the header that the decls came out of, not the cpp file corresponding to the translation unit.
In my experience so far, the "source" of some given AST node is best retrieved by using Locations. For example every node at least has a start location, and when you print this out it will contain the header file path.
Then it's possible to use this path to decide whether it is a system library or part of your application code that you still are interested in examining.
One route I'm looking at is to narrow matches with things like hasName() (as found here. For example:
recordDecl(hasName("MyBaseClass")) // etc.
However your comment above using -ast-dump is something I tried as well to get a lay of the land on my own CLang tool. I found this post to be extremely helpful. Armed with their suggestion, I used clang-check to filter to a specific class name and fed it my top-level CPP file. The output was a much more manageable few hundred lines representing the class declarations and definitions of interest.

C++ internal code reuse: compile everything or share the library / dynamic library?

General question:
For unmanaged C++, what's better for internal code sharing?
Reuse code by sharing the actual source code? OR
Reuse code by sharing the library / dynamic library (+ all the header files)
Whichever it is: what's your strategy for reducing duplicate code (copy-paste syndrome), code bloat?
Specific example:
Here's how we share the code in my organization:
We reuse code by sharing the actual source code.
We develop on Windows using VS2008, though our project actually needs to be cross-platform. We have many projects (.vcproj) committed to the repository; some might have its own repository, some might be part of a repository. For each deliverable solution (.sln) (e.g. something that we deliver to the customer), it will svn:externals all the necessary projects (.vcproj) from the repository to assemble the "final" product.
This works fine, but I'm quite worried about eventually the code size for each solution could get quite huge (right now our total code size is about 75K SLOC).
Also one thing to note is that we prevent all transitive dependency. That is, each project (.vcproj) that is not an actual solution (.sln) is not allowed to svn:externals any other project even if it depends on it. This is because you could have 2 projects (.vcproj) that might depend on the same library (i.e. Boost) or project (.vcproj), thus when you svn:externals both projects into a single solution, svn:externals will do it twice. So we carefully document all dependencies for each project, and it's up to guy that creates the solution (.sln) to ensure all dependencies (including transitive) are svn:externals as part of the solution.
If we reuse code by using .lib , .dll instead, this would obviously reduce the code size for each solution, as well as eliminiate the transitive dependency mentioned above where applicable (exceptions are, for example, third-party library/framework that use dll like Intel TBB and the default Qt)
Addendum: (read if you wish)
Another motivation to share source code might be summed up best by Dr. GUI:
On top of that, what C++ makes easy is
not creation of reusable binary
components; rather, C++ makes it
relatively easy to reuse source code.
Note that most major C++ libraries are
shipped in source form, not compiled
form. It's all too often necessary to
look at that source in order to
inherit correctly from an object—and
it's all too easy (and often
necessary) to rely on implementation
details of the original library when
you reuse it. As if that isn't bad
enough, it's often tempting (or
necessary) to modify the original
source and do a private build of the
library. (How many private builds of
MFC are there? The world will never
know . . .)
Maybe this is why when you look at libraries like Intel Math Kernel library, in their "lib" folder, they have "vc7", "vc8", "vc9" for each of the Visual Studio version. Scary stuff.
Or how about this assertion:
C++ is notoriously non-accommodating
when it comes to plugins. C++ is
extremely platform-specific and
compiler-specific. The C++ standard
doesn't specify an Application Binary
Interface (ABI), which means that C++
libraries from different compilers or
even different versions of the same
compiler are incompatible. Add to that
the fact that C++ has no concept of
dynamic loading and each platform
provide its own solution (incompatible
with others) and you get the picture.
What's your thoughts on the above assertion? Does something like Java or .NET face these kinds of problems? e.g. if I produce a JAR file from Netbeans, will it work if I import it into IntelliJ as long as I ensure that both have compatible JRE/JDK?
People seem to think that C specifies an ABI. It doesn't, and I'm not aware of any standardised compiled language that does. To answer your main question, use of libraries is of course the way to go - I can't imagine doing anything else.
One good reason to share the source code: Templates are one of C++'s best features because they are an elegant way around the rigidity of static typing, but by their nature are a source-level construct. If you focus on binary-level interfaces instead of source-level interfaces, your use of templates will be limited.
We do the same. Trying to use binaries can be a real problem if you need to use shared code on different platforms, build environments, or even if you need different build options such as static vs. dynamic linking to the C runtime, different structure packing settings, etc..
I typically set projects up to build as much from source on-demand as possible, even with third-party code such as zlib and libpng. For those things that must be built separately, e.g. Boost, I typically have to build 4 or 8 different sets of binaries for the various combinations of settings needed (debug/release, VS7.1/VS9, static/dynamic), and manage the binaries along with the debugging information files in source control.
Of course, if everyone sharing your code is using the same tools on the same platform with the same options, then it's a different story.
I never saw shared libraries as a way to reuse code from an old project into a new one. I always thought it was more about sharing a library between different applications that you're developing at about the same time, to minimize bloat.
As far as copy-paste syndrome goes, if I copy and paste it in more than a couple places, it needs to be its own function. That's independent of whether the library is shared or not.
When we reuse code from an old project, we always bring it in as source. There's always something that needs tweaking, and its usually safer to tweak a project-specific version than to tweak a shared version that can wind up breaking the previous project. Going back and fixing the previous project is out of the question because 1) it worked (and shipped) already, 2) it's no longer funded, and 3) the test hardware needed may no longer be available.
For example, we had a communication library that had an API for sending a "message", a block of data with a message ID, over a socket, pipe, whatever:
void Foo:Send(unsigned messageID, const void* buffer, size_t bufSize);
But in a later project, we needed an optimization: the message needed to consist of several blocks of data in different parts of memory concatenated together, and we couldn't (and didn't want to, anyway) do the pointer math to create the data in its "assembled" form in the first place, and the process of copying the parts together into a unified buffer was taking too long. So we added a new API:
void Foo:SendMultiple(unsigned messageID, const void** buffer, size_t* bufSize);
Which would assemble the buffers into a message and send it. (The base class's method allocated a temporary buffer, copied the parts together, and called Foo::Send(); subclasses could use this as a default or override it with their own, e.g. the class that sent the message on a socket would just call send() for each buffer, eliminating a lot of copies.)
Now, by doing this, we have the option of backporting (copying, really) the changes to the older version, but we're not required to backport. This gives the managers flexibility, based on the time and funding constraints they have.
EDIT: After reading Neil's comment, I thought of something that we do that I need to clarify.
In our code, we do lots of "libraries". LOTS of them. One big program I wrote had something like 50 of them. Because, for us and with our build setup, they're easy.
We use a tool that auto-generates makefiles on the fly, taking care of dependencies and almost everything. If there's anything strange that needs to be done, we write a file with the exceptions, usually just a few lines.
It works like this: The tool finds everything in the directory that looks like a source file, generates dependencies if the file changed, and spits out the needed rules. Then it makes a rule to take eveything and ar/ranlib it into a libxxx.a file, named after the directory. All the objects and library are put in a subdirectory that is named after the target platform (this makes cross-compilation easy to support). This process is then repeated for every subdirectory (except the object file subdirs). Then the top-level directory gets linked with all the subdirs' libraries into the executable, and a symlink is created, again, naked after the top-level directory.
So directories are libraries. To use a library in a program, make a symbolic link to it. Painless. Ergo, everything's partitioned into libraries from the outset. If you want a shared lib, you put a ".so" suffix on the directory name.
To pull in a library from another project, I just use a Subversion external to fetch the needed directories. The symlinks are relative, so as long as I don't leave something behind it still works. When we ship, we lock the external reference to a specific revision of the parent.
If we need to add functionality to a library, we can do one of several things. We can revise the parent (if it's still an active project and thus testable), tell Subversion to use the newer revision and fix any bugs that pop up. Or we can just clone the code, replacing the external link, if messing with the parent is too risky. Either way, it still looks like a "library" to us, but I'm not sure that it matches the spirit of a library.
We're in the process of moving to Mercurial, which has no "externals" mechanism so we have to either clone the libraries in the first place, use rsync to keep the code synced between the different repositories, or force a common directory structure so you can have hg pull from multiple parents. The last option seems to be working pretty well.

What's the rationale behind headers?

I don't quite understand the point of having a header; it seems to violate the DRY principle! All the information in a header is (can be) contained in the implementation.
It simplifies the compilation process. When you want to compile units independently, you need something to describe the parts that will be linked to without having to import the entirety of all the other files.
It also allows for code hiding. One can distribute a header to allow others to use the functionality without having to distribute the implementation.
Finally, it can encourage the separation of interface from implementation.
They are not the only way to solve these problems, but 30 years ago they were a good one. We probably wouldn't use header files for a language today, but they weren't invented in 2009.
The architects of many modern languages such as Java, Eiffel and C# clearly agree with you -- those languages extract the metadata about a module from the implementation. However, per se, the concept of headers doesn't preclude that -- it would obviously be a simple task for a compiler to extract a .h file while compiling a .c, for example, just like the compilers for those other languages do implicitly. The fact that typical current C compilers do not do it is not a language design issue -- it's an implementation issue; apparently there's no demand by users for such a feature, so no compiler vendor bothers implementing it.
As a language design choice, having separate .h files (in a human-readable and editable text format) gives you the best of both worlds: you can start separately compiling client code based on a module implementation that doesn't yet exist, if you wish, by writing the .h file by hand; or you (assuming by absurd a compiler implementation that supplies it;-) can get the .h file automatically from the implementation as a side effect of compiling it.
If C, C++, &c, keep thriving (apparently they're still doing fine today;-), and demand like yours for not manually writing headers grows, eventually compiler writers will have to supply the "header generation" option, and the "best of both worlds" won't stay theoretical!-)
It helps to think a bit about the capabilities of the computers that were available when, say c, was written. Main memory was measured in kilowords, and not necessarily very many of them. Disks were bigger, but not much. Serrious storage meant reel-to-reel tapes, mounted by hand, by grumpy operators, who really wanted you to go away so they could play hunt the wumpus. A 1 MIPS machine was screaming fast. And with all these limitation you had to share it. Possibly with a score of other users.
Anything that reduced the space or time complexity of compilation was a big win. And headers do both.
Don't forget the documentation a header provides. There is usually anything in it you need to know for using the module. I for my part don't want to scan through a looong sourcecode to learn what there is that I need to use and how to call it... You would extract this information anyway, which effectively results in -- a header file. No longer an issue with modern IDEs, of course, but working with some old C code I really love to have hand-crafted header files that include comments about the usage and about pre- and postconditions.
Keeping source, header and additional documentation in sync still is another can of worms...
The whole idea of inspecting the binary output files of language processors would have been hard to comprehend when C invented .h files. There was a system called JOVIAL that did something like it, but it was exotic and confined more-or-less exclusively to military projects. (I've never seen a JOVIAL program, I've only heard about it.)
So when C came out the usual design pattern for modularity was "no checks whatsoever". There might be a restriction that .text symbols could only link to .text and .data to .data, but that was it. That is, the compilers of the day typically processed one source file at a time and then linkers put them together without the slightest level of error checking other than, if you were lucky, "I'm a function symbol" vs "I'm a data symbol".
So the idea of actually having the compiler understand the thing you were calling was somewhat new.
Even today, if you make a totally bogus header, no one catches you in most AOT compilers. Clever things like CLR languages and Java actually do encode things in the class files.
So yes, in the long run, we probably won't have header files.
No you dont have headers in Java -- but you do have interfaces and I every serious Java guru recommends you define anything used by other projects/systems as an interface and an implementation.
Lets see a java interface definition contains call signatures, type definitions and contants.
MOST C header files contain call signatures, type definitions and constants.
So for all pratical purposes C/C++ header files are just interface definitions and should thus be considered a Good Thing. Now I know its possible to define a myriad other things in header files as well (MARCROs, constants etc. etc. ) but that just part of the whole wonderful world of C:-
int function target () {
// Default for shoot
return FOOT;
}
For Detail Read this
A header file commonly contains forward declarations of classes, subroutines, variables, and other identifiers. Programmers who wish to declare standardized identifiers in more than one source file can place such identifiers in a single header file, which other code can then include whenever the header contents are required.
The C standard library and C++ standard library traditionally declare their standard functions in header files.
And what if you want to give somebody else the declarations to use your library without giving them the implementation?
As another answer points out - the original reason for headers was to make the parse/compile easier on platforms with very simple and limited tools. It was a great step forward to have a machine with 2 floppies so you could have the compiler on one and your code on the other - made things a lot easier.
When you divide code in header and source files you divide declaration and definition. When you look in header files you can see what you have and if you wand to see implementation details you go to source file.

Where do I learn "what I need to know" about C++ compilers?

I'm just starting to explore C++, so forgive the newbiness of this question. I also beg your indulgence on how open ended this question is. I think it could be broken down, but I think that this information belongs in the same place.
(FYI -- I am working predominantly with the QT SDK and mingw32-make right now and I seem to have configured them correctly for my machine.)
I knew that there was a lot in the language which is compiler-driven -- I've heard about pre-compiler directives, but it seems like someone would be able to write books the different C++ compilers and their respective parameters. In addition, there are commands which apparently precede make (like qmake, for example (is this something only in QT)).
I would like to know if there is any place which gives me an overview of what compilers are out there, and what their different options are. I'd also like to know how each of them views Makefiles (it seems that there is a difference in syntax between them?).
If there is no website regarding, "Everything you need to know about C++ compilers but were afraid to ask," what would be the best way to go about learning the answers to these questions?
Concerning the "numerous options of the various compilers"
A piece of good news: you needn't worry about the detail of most of these options. You will, in due time, delve into this, only for the very compiler you use, and maybe only for the options that pertain to a particular set of features. But as a novice, generally trust the default options or the ones supplied with the make files.
The broad categories of these features (and I may be missing a few) are:
pre-processor defines (now, you may need a few of these)
code generation (target CPU, FPU usage...)
optimization (hints for the compiler to favor speed over size and such)
inclusion of debug info (which is extra data left in the object/binary and which enables the debugger to know where each line of code starts, what the variables names are etc.)
directives for the linker
output type (exe, library, memory maps...)
C/C++ language compliance and warnings (compatibility with previous version of the compiler, compliance to current and past C Standards, warning about common possible bug-indicative patterns...)
compile-time verbosity and help
Concerning an inventory of compilers with their options and features
I know of no such list but I'm sure it probably exists on the web. However, suggest that, as a novice you worry little about these "details", and use whatever free compiler you can find (gcc certainly a great choice), and build experience with the language and the build process. C professionals may likely argue, with good reason and at length on the merits of various compilers and associated runtine etc., but for generic purposes -and then some- the free stuff is all that is needed.
Concerning the build process
The most trivial applications, such these made of a single unit of compilation (read a single C/C++ source file), can be built with a simple batch file where the various compiler and linker options are hardcoded, and where the name of file is specified on the command line.
For all other cases, it is very important to codify the build process so that it can be done
a) automatically and
b) reliably, i.e. with repeatability.
The "recipe" associated with this build process is often encapsulated in a make file or as the complexity grows, possibly several make files, possibly "bundled together in a script/bat file.
This (make file syntax) you need to get familiar with, even if you use alternatives to make/nmake, such as Apache Ant; the reason is that many (most?) source code packages include a make file.
In a nutshell, make files are text files and they allow defining targets, and the associated command to build a target. Each target is associated with its dependencies, which allows the make logic to decide what targets are out of date and should be rebuilt, and, before rebuilding them, what possibly dependencies should also be rebuilt. That way, when you modify say an include file (and if the make file is properly configured) any c file that used this header will be recompiled and any binary which links with the corresponding obj file will be rebuilt as well. make also include options to force all targets to be rebuilt, and this is sometimes handy to be sure that you truly have a current built (for example in the case some dependencies of a given object are not declared in the make).
On the Pre-processor:
The pre-processor is the first step toward compiling, although it is technically not part of the compilation. The purposes of this step are:
to remove any comment, and extraneous whitespace
to substitute any macro reference with the relevant C/C++ syntax. Some macros for example are used to define constant values such as say some email address used in the program; during per-processing any reference to this constant value (btw by convention such constants are named with ALL_CAPS_AND_UNDERSCORES) is replace by the actual C string literal containing the email address.
to exclude all conditional compiling branches that are not relevant (the #IFDEF and the like)
What's important to know about the pre-processor is that the pre-processor directive are NOT part of the C-Language proper, and they serve several important functions such as the conditional compiling mentionned earlier (used for example to have multiple versions of the program, say for different Operating Systems, or indeed for different compilers)
Taking it from there...
After this manifesto of mine... I encourage to read but little more, and to dive into programming and building binaries. It is a very good idea to try and get a broad picture of the framework etc. but this can be overdone, a bit akin to the exchange student who stays in his/her room reading the Webster dictionary to be "prepared" for meeting native speakers, rather than just "doing it!".
Ideally you shouldn't need to care what C++ compiler you are using. The compatability to the standard has got much better in recent years (even from microsoft)
Compiler flags obviously differ but the same features are generally available, it's just a differently named option to eg. set warning level on GCC and ms-cl
The build system is indepenant of the compiler, you can use any make with any compiler.
That is a lot of questions in one.
C++ compilers are a lot like hammers: They come in all sizes and shapes, with different abilities and features, intended for different types of users, and at different price points; ultimately they all are for doing the same basic task as the others.
Some are intended for highly specialized applications, like high-performance graphics, and have numerous extensions and libraries to assist the engineer with those types of problems. Others are meant for general purpose use, and aren't necessarily always the greatest for extreme work.
The technique for using each type of hammer varies from model to model—and version to version—but they all have a lot in common. The macro preprocessor is a standard part of C and C++ compilers.
A brief comparison of many C++ compilers is here. Also check out the list of C compilers, since many programs don't use any C++ features and can be compiled by ordinary C.
C++ compilers don't "view" makefiles. The rules of a makefile may invoke a C++ compiler, but also may "compile" assembly language modules (assembling), process other languages, build libraries, link modules, and/or post-process object modules. Makefiles often contain rules for cleaning up intermediate files, establishing debug environments, obtaining source code, etc., etc. Compilation is one link in a long chain of steps to develop software.
Also, many development environments abstract the makefile into a "project file" which is used by an integrated development environment (IDE) in an attempt to simplify or automate many programming tasks. See a comparison here.
As for learning: choose a specific problem to solve and dive in. The target platform (Linux/Windows/etc.) and problem space will narrow the choices pretty well. Which you choose is often linked to other considerations, such as working for a particular company, or being part of a team. C++ has something like 95% commonality among all its flavors. Learn any one of them well, and learning the next is a piece of cake.