Related
I'm working on some custom C++ static code analysis for my PHD thesis. As part of an extension to the C++ type system, I want to take a C++ code base and enumerate its available functions, methods, and classes, along with their type signatures, with minimal effort (it's just a prototype). What's the best approach to doing something like this quickly and easily? Should I be hacking on Clang to spit out the information I need? Should I look at parsing header files with something like SWIG? Or is there an even easier thing I could be doing?
GCCXML, based on GCC, might be the ticket.
As I understand it, it collects and dumps all definitions but not the content of functions/methods.
Others will likely mention CLANG, which certainly parses code and must have access to the definitions of the symbols in a compilation unit. (I have no experience here).
For completeness, you should know about our DMS Software Reengineering Toolkit
with its C++ Front End. (The CLANG answers seem to say "walk the AST"). The DMS solution provides an enumerable symbol table containing all the type information. You can walk the AST, too, if you want.
Often a static analysis leads to a diagnosis, and a desire to change the source code.
DMS can apply source-to-source program transformations to carry out such changes conditioned
by the analysis.
I heartily recommend LLVM for statical analysis (see also Clang Static Analyzer)
I think your best bet is hacking on clang and getting the AST. There is a good tutorial for that here. Its very easy to modify its syntax and it also has a static analyzer.
At my work, I use the API from a software package called "Understand 4 C++" by scitools. I use this to write all my static analysis tools. I even wrote a .NET API to wrap their C API. Which I put on codeplex.
Once you have that, dumping all class types is easy:
ClassType[] allclasses = Database.GetAllClassTypes()
foreach (ClassType c in allclasses)
{
Console.WriteLine("Class Name: {0}", c.NameLong);
}
Now for a little backstory about a task I had that is similar to yours.
In some years we have to keep our SDK binary backwards compatible with the previous years SDK. In that case it's useful to compare the SDK code between releases to check for potential breaking changes. However with a couple of hundred files, and tens of thousands of lines of comments that can be a big headache using a text diff tool like Beyond Compare or Araxis. So what I really need to look at is actual code changes, not re-ordering, not moving code up and down in the file, not adding comments etc...
So, a tool I wrote to dump out all the code.
In one text file I dump all all the classes. For each class I print its inheritance tree, its member functions both virtual and non-virtual. For each virtual function I print what parent class virtual methods it overrides (if any). I also print out its member variables.
Same goes with the structs.
In another file I print all the macro's.
In another file I print all the typedefs.
Then using this I can diff these files with files from a previous release. It then becomes apparent instantly what has changed from release to release. For instance it's easy to see where a function parameter was changed from TCHAR* to const TCHAR* for instance.
You might consider developping a GCC Plugin for your purpose.
And GCC MELT is a high level domain specific language (that I designed & implemented) for easily extend GCC.
The paper at GROW09 workshop by Peter Collingbourne and Paul Kelly on a A Compile-Time Infrastructure for GCC Using Haskell might be relevant for your work.
Rationale: In my day-to-day C++ code development, I frequently need to
answer basic questions such as who calls what in a very large C++ code
base that is frequently changing. But, I also need to have some
automated way to exactly identify what the code is doing around a
particular area of code. "grep" tools such as Cscope are useful (and
I use them heavily already), but are not C++-language-aware: They
don't give any way to identify the types and kinds of lexical
environment of a given use of a type or function a such way that is
conducive to automation (even if said automation is limited to
"read-only" operations such as code browsing and navigation, but I'm
asking for much more than that below).
Question: Does there exist already an open-source C/C++-based library
(native, not managed, not Microsoft- or Linux-specific) that can
statically scan or analyze a large tree of C++ code, and can produce
result sets that answer detailed questions such as:
What functions are called by some supplied function?
What functions make use of this supplied type?
Ditto the above questions if C++ classes or class templates are involved.
The result set should provide some sort of "handle". I should be able
to feed that handle back to the library to perform the following types
of introspection:
What is the byte offset into the file where the reference was made?
What is the reference into the abstract syntax tree (AST) of that
reference, so that I can inspect surrounding code constructs? And
each AST entity would also have file path, byte-offset, and
type-info data associated with it, so that I could recursively walk
up the graph of callers or referrers to do useful operations.
The answer should meet the following requirements:
API: The API exposed must be one of the following:
C or C++ and probably is "C handle" or C++-class-instance-based
(and if it is, must be generic C o C++ code and not Microsoft- or
Linux-specific code constructs unless it is to meet specifics of
the given platform), or
Command-line standard input and standard output based.
C++ aware: Is not limited to C code, but understands C++ language
constructs in minute detail including awareness of inter-class
inheritance relationships and C++ templates.
Fast: Should scan large code bases significantly faster than
compiling the entire code base from scratch. This probably needs to
be relaxed, but only if Incremental result retrieval and Resilient
to small code changes requirements are fully met below.
Provide Result counts: I should be able to ask "How many results
would you provide to some request (and no don't send me all of the
results)?" that responds on the order of less than 3 seconds versus
having to retrieve all results for any given question. If it takes
too long to get that answer, then wastes development time. This is
coupled with the next requirement.
Incremental result retrieval: I should be able to then ask "Give me
just the next N results of this request", and then a handle to the
result set so that I can ask the question repeatedly, thus
incrementally pulling out the results in stages. This means I
should not have to wait for the entire result set before seeing
some subset of all of the results. And that I can cancel the
operation safely if I have seen enough results. Reason: I need to
answer the question: "What is the build or development impact of
changing some particular function signature?"
Resilient to small code changes: If I change a header or source
file, I should not have to wait for the entire code base to be
rescanned, but only that header or source file
rescanned. Rescanning should be quick. E.g., don't do what cscope
requires you to do, which is to rescan the entire code base for
small changes. It is understood that if you change a header, then
scanning can take longer since other files that include that header
would have to be rescanned.
IDE Agnostic: Is text editor agnostic (don't make me use a specific
text editor; I've made my choice already, thank you!)
Platform Agnostic: Is platform-agnostic (don't make me only use it
on Linux or only on Windows, as I have to use both of those
platforms in my daily grind, but I need the tool to be useful on
both as I have code sandboxes on both platforms).
Non-binary: Should not cost me anything other than time to download
and compile the library and all of its dependencies.
Not trial-ware.
Actively Supported: It is likely that sending help requests to mailing lists
or associated forums is likely to get a response in less than 2
days.
Network agnostic: Databases the library builds should be able to be used directly on
a network from 32-bit and 64-bit systems, both Linux and Windows
interchangeably, at the same time, and do not embed hardcoded paths
to filesystems that would otherwise "root" the database to a
particular network.
Build environment agnostic: Does not require intimate knowledge of my build environment, with
the notable exception of possibly requiring knowledge of compiler
supplied CPP macro definitions (e.g. -Dmacro=value).
I would say that CLang Index is a close fit. However I don't think that it stores data in a database.
Anyway the CLang framework offer what you actually need to build a tool tailored to your needs, if only because of its C, C++ and Objective-C parsing / indexing capabitilies. And since it's provided as a set of reusable libraries... it was crafted for being developed on!
I have to admit that I haven't used either because I work with a lot of Microsoft-specific code that uses Microsoft compiler extensions that i don't expect them to understand, but the two open source analyzers I'm aware of are Mozilla Pork and the Clang Analyzer.
If you are looking for results of code analysis (metrics, graphs, ...) why not use a tool (instead of API) to do that? If you can, I suggest you to take a look at Understand.
It's not free (there's a trial version) but I found it very useful.
Maybe Doxygen with GraphViz could be the answer of some of your constraints but not all,for example the analysis of Doxygen is not incremental.
I wish to send some components to my customers. The reasons I want to deliver source code are:
1) My class is templatized. Customer might use any template argument, so I can't pre-compile and send .o file.
2) The customer might use different compiler versions for gcc than mine. So I want him to do compilation at his end.
Now, I can't reveal my source code for obvious reasons. The max I can do is to reveal the .h file. Any ideas how I may achieve this. I am thinking about some hooks in gcc that supports decryption before compilation, etc. Is this possible?
In short, I want him to be able to compile this code without being able to peek inside.
Contract = good, obfuscation = ungood.
That said, you can always do a kind of PIMPL idiom to serve your customer with binaries and just templated wrappers in the header(s). The idea is then to use an "untyped" separately compiled implementation, where the templated wrapper just provides type safety for client code. That's how one often did things before compilers started to understand how to optimize templates, that is, to avoid machine-code level code bloat, but it only provides some measure of protection about trivial copy-and-paste theft, not any protection against someone willing to delve into the machine code.
But perhaps the effort is then greater than just reinventing your functionality?
Just adding some terminology to Alf's answer: The Thin template idiom is what you might look at. It basically simulates the functionality of a generic. Don't get confused by the wikipedia article which pops up in google, you don't have to use void*...
This, of course, does not guarantee binary compatibility. As usual with 'native' c++, you either compile the component for customers platform yourself and deploy the binary, or give them your code... The difference to the pure generic component code is that you can do the former at all.
use some c++ obfuscators may be help?: http://www.semdesigns.com/products/obfuscators/CppObfuscationExample.html or Magle It
First, if you're going to provide the source code, then you have to provide the source code. Sure, you could encrypt it, but even if GCC had a "decrypt before compile" option, it would need to decrypt the code, and if GCC can decrypt the code, so can your customer.
What you're asking is impossible. (If you find a way to do it, I believe the movie industry might have a multi-million contract for you. They currently have to resort to expensive custom hardware to prevent people from ripping content, and that only works to a limited degree)
As for your "obvious reasons" why you don't want to provide the source code, I don't see why they're obvious. What would happen if you provided the source code?
You have two options:
provide the source code in its entirety, or
compile everything that can be precompiled into a (static or dynamic) library, and provide your customer with that, plus the header files.
what about pimpls?
1) My class is templatized. Customer might use any template argument, so I can't pre-compile and send .o file.
2) The customer might use different compiler versions for gcc than mine. So I want him to do compilation at his end.
Now, I can't reveal my source code for obvious reasons. The max I can do is to reveal the .h file. Any ideas how I may achieve this. I am thinking about some hooks in gcc that supports decryption before compilation, etc. Is this possible?
In short, I want him to be able to compile this code without being able to peek inside.
Consideration 2) above encompasses A) ABI differences such that the same code compiled with different compiler versions/vendors on the same platform is incompatible, as well as B) the differences in system libraries, kernel versions etc. that the code might be dependent on. The only general solution is to compile on the specific platforms. Either you do it for all platforms, or you give them all the source code and they do it. That's not just the headers and template implementation, that's your out-of-line functions too. You might mitigate A) a little by building a wall of more interoperable extern "C" functions, but you're basically stuck when it comes to B).
So, can you decrypt during compilation? Only if you ship your own hacked GCC binaries to them, built for their specific system, which is probably more hassle than providing different builds of your own libraries (though it may address the template/header exposure issue).
Alternatively, you could employ source code obfuscation techniques. This is probably - practically - as good as it gets. I don't know what tools are out there, but it's an approach that people have pursued for decades (though I'm yet to hear anyone recommend it), so there's sure to be some mature tools.
Re templated code - other people have suggested a templated front end to a C-style generic implementation shipped as a precompiled object. That may or may not be practical (clearly risks performance degradation, and you have to capture the set of type-specific operations you want - e.g. by instantiating a type-specific class derived from an abstract operations base class) but anyway the precompiled object still runs afoul of B).
One other thought... clients might take your source code, but are unlikely to understand it as well as you. Even if they build more systems dependent on their version of it, in a way they're getting more locked in, and may have more need for your services in future. And, if you see they've not played fair, you charge them for it appropriately when the time comes.
It seems with gcc 4.5 comes the support for plugins. So you can provide your own .so which would be, for instance, called before compilation stage starts. So you can have all kinds of tricks(decryption of source file) in there, neatly hidden. This would also be portable solution as no change is made to g++ per se.
This is exactly what I was looking for. You can read more here:
http://www.codesynthesis.com/~boris/blog/2010/05/03/parsing-cxx-with-gcc-plugin-part-1/
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
While C++ Standards Committee works hard to define its intricate but powerful features and maintain its backward compatibility with C, in my personal experience I've found many aspects of programming with C++ cumbersome due to lack of tools.
For example, I recently tried to refactor some C++ code, replacing many shared_ptr by T& to remove pointer usages where not needed within a large library. I had to perform almost the whole refactoring manually as none of the refactoring tools out there would help me do this safely.
Dealing with STL data structures using the debugger is like raking out the phone number of a stranger when she disagrees.
In your experience, what essential developer tools are lacking in C++?
My dream tool would be a compile-time template debugger. Something that'd let me interactively step through template instantiations and examine the types as they get instantiated, just like the regular debugger does at runtime.
In your experience, what essential developer tools are lacking in C++?
Code completion. Seriously. Refactoring is a nice-to-have feature but I think code completion is much more fundamental and more important for API discoverabilty and usabilty.
Basically, tools that require any undestanding of C++ code suck.
Code generation of class methods. When I type in the declaration you should be able to figure out the definition. And while I'm on the topic can we fix "goto declaration / goto definition" always going to the declaration?
Refactoring. Yes I know it's formally impossible because of the pre-processor - but the compiler could still do a better job of a search and replace on a variable name than I can maually. You could also syntax highlight local, members and paramaters while your at it.
Lint. So the variable I just defined shadows a higher one? C would have told me that in 1979, but c++ in 2009 apparently prefers me to find out on my own.
Some decent error messages. If I promise never to define a class with the same name inside the method of a class - do you promise to tell me about a missing "}". In fact can the compiler have some knowledge of history - so if I added an unbalanced "{" or "(" to a previously working file could we consider mentioning this in the message?
Can the STL error messages please (sorry to quote another comment) not look like you read "/dev/random", stuck "!/bin/perl" in front and then ran the tax code through the result?
How about some warnings for useful things? "Integer used as bool performance warning" is not useful, it doesn't make any performance difference, I don't have a choice - it's what the library does, and you have already told me 50 times.
But if I miss a ";" from the end of a class declaration or a "}" from the end of a method definition you don't warn me - you go out of your way to find the least likely (but theoretically) correct way to parse the result.
It's like the built in spell checker in this browser which happily accepts me misspelling wether (because that spelling is an archaic term for a castrated male goat! How many times do I write about soprano herbivores?)
How about spell checking? 40 years ago mainframe Fortran compilers had spell checking so if misspelled "WRITE" you didn't come back the next day to a pile of cards and a snotty error message. You got a warning that "WRIET" had been changed to WRITE in line X. Now the compiler happily continues and spends 10mins building some massive browse file and debugger output before telling you that you misspelled prinft 10,000 lines ago.
ps. Yes a lot of these only apply to Visual C++.
pps. Yes they are coming with my medication now.
If talking about MS Visual Studio C++, Visual Assist is a very handy tool for code completition, some refactorings - e.g. rename all/selected references, find/goto declaration, but I still miss the richness of Java IDEs like JBuilder or IntelliJ.
What I still miss, is a semantic diff tool - you know, one which does not compare the two files line-by-line, but statements/expressions. What I've found on the internet are only some abandoned tries - if you know one, please write in comment
The main problem with C++ is that it is hard to parse. That's why there are so very few tools out there that work on source code. (And that's also why we're stuck with some of the most horrific error messages in the history of compilers.) The result is, that, with very few exceptions (I only know doxygen and Visual Assist), it's down to the actual compiler to support everything needed to assist us writing and massaging the code. With compilers traditionally being rather streamlined command line tools, that's a very weak foundation to build rich editor support on.
For about ten years now, I'm working with VS. meanwhile, its code completion is almost usable. (Yes, I'm working on dual core machines. I wouldn't have said this otherwise, wouldn't I?) If you use Visual Assist, code completion is actually quite good. Both VS itself and VA come with some basic refactoring nowadays. That, too, is almost usable for the few things it aims for (even though it's still notably less so than code completion). Of course, >15 years of refactoring with search & replace being the only tool in the box, my demands are probably much too deteriorated compared to other languages, so this might not mean much.
However, what I am really lacking is still: Fully standard conforming compilers and standard library implementations on all platforms my code is ported to. And I'm saying this >10 years after the release of the last standard and about a year before the release of the next one! (Which just adds this: C++1x being widely adopted by 2011.)
Once these are solved, there's a few things that keep being mentioned now and then, but which vendors, still fighting with compliance to a >10 year old standard (or, as is actually the case with some features, having even given up on it), never got around to actually tackle:
usable, sensible, comprehensible compiler messages (como is actually pretty good, but that's only if you compare it to other C++ compilers); a linker that doesn't just throw up its hands and says "something's wrong, I can't continue" (if you have taught C++ as a first language, you'll know what I mean); concepts ('nuff said)
an IO stream implementation that doesn't throw away all the compile-time advantages which overloading operator<<() gives us by resorting to calling the run-time-parsing printf() under the hood (Dietmar Kühl once set out to do this, unfortunately his implementation died without the techniques becoming widespread)
STL implementations on all platforms that give rich debugging support (Dinkumware is already pretty good in that)
standard library implementations on all platforms that use every trick in the book to give us stricter checking at compile-time and run-time and more performance (wnhatever happened to yasli?)
the ability to debug template meta programs (yes, jalf already mentioned this, but it cannot be said too often)
a compiler that renders tools like lint useless (no need to fear, lint vendors, that's just wishful thinking)
If all these and a lot of others that I have forgotten to mention (feel free to add) are solved, it would be nice to get refactoring support that almost plays in the same league as, say, Java or C#. But only then.
A compiler which tries to optimize the compilation model.
Rather than naively include headers as needed, parsing them again in every compilation unit, why not parse the headers once first, build complete syntax trees for them (which would have to include preprocessor directives, since we don't yet know which macros are defined), and then simply run through that syntax tree whenever the header is included, applying the known #defines to prune it.
It could even be be used as a replacement for precompiled headers, so every header could be precompiled individually, just by dumping this syntax tree to the disk. We wouldn't need one single monolithic and error-prone precompiled header, and would get finer granularity on rebuilds, rebuilding as little as possible even if a header is modified.
Like my other suggestions, this would be a lot of work to implement, but I can't see any fundamental problems rendering it impossible.
It seems like it could dramatically speed up compile-times, pretty much rendering it linear in the number of header files, rather than in the number of #includes.
A fast and reliable indexer. Most of the fancy features come after this.
A common tool to enforce coding standards.
Take all the common standards and allow you to turn them on/off as appropriate for your project.
Currently just a bunch of perl scrips usullay has to supstitute.
I'm pretty happy with the state of C++ tools. The only thing I can think of is a default install of Boost in VS/gcc.
Refactoring, Refactoring, Refactoring. And compilation while typing. For refactorings I am missing at least half of what most modern Java IDEs can do. While Visual Assist X goes a long way, a lot of refactoring is missing. The task of writing C++ code is still pretty much that. Writing C++ code. The more the IDE supports high level refactoring the more it becomes construction, the more mallable the structure is the easier it will be to iterate over the structure and improve it. Pick up a demo version of Intellij and see what you are missing. These are just some that I remember from a couple of years ago.
Extract interface: taken a view classes with a common interface, move the common functions into an interface class (for C++ this would be an abstract base class) and derive the designated functions as abstract
Better extract method: mark a section of code and have the ide write a function that executes that code, constructing the correct parameters and return values
Know the type of each of the symbols that you are working with so that not only command completion can be correct for derived values e.g. symbol->... but also only offer functions that return the type that can be used in the current expression e.g. for
UiButton button = window->...
At the ... only insert functions that actually return a UiButton.
A tool all on it's own: Naming Conventions.
Intelligent Intellisense/Code Completion even for template-heavy code.
When you're inside a function template, of course the compiler can't say anything for sure about the template parameter (at least not without Concepts), but it should be able to make a lot of guesses and estimates. Depending on how the type is used in the function, it should be able to narrow the possible types down, in effect a kind of conservative ad-hoc Concepts. If one line in the function calls .Foo() on a template type, obviously a Foo member method must exist, and Intellisense should suggest it in the rest of the function as well.
It could even look at where the function is invoked from, and use that to determine at least one valid template parameter type, and simply offer Intellisense inside the function based on that.
If the function is called with a int as a template parameter, then obviously, use of int must be valid, and so the IDE could use that as a "sample type" inside the function and offer Intellisense suggestions based on that.
JavaScript just got Intellisense support in VS, which had to overcome a lot of similar problems, so it can be done. Of course, with C++'s level of complexity, it'd be a ridiculous amount of work. But it'd be a nice feature.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Many languages, such as Java, C#, do not separate declaration from implementation. C# has a concept of partial class, but implementation and declaration still remain in the same file.
Why doesn't C++ have the same model? Is it more practical to have header files?
I am referring to current and upcoming versions of C++ standard.
Backwards Compatibility - Header files are not eliminated because it would break Backwards Compatibility.
Header files allow for independent compilation. You don't need to access or even have the implementation files to compile a file. This can make for easier distributed builds.
This also allows SDKs to be done a little easier. You can provide just the headers and some libraries. There are, of course, ways around this which other languages use.
Even Bjarne Stroustrup has called header files a kludge.
But without a standard binary format which includes the necessary metadata (like Java class files, or .Net PE files) I don't see any way to implement the feature. A stripped ELF or a.out binary doesn't have much of the information you would need to extract. And I don't think that the information is ever stored in Windows XCOFF files.
I routinely flip between C# and C++, and the lack of header files in C# is one of my biggest pet peeves. I can look at a header file and learn all I need to know about a class - what it's member functions are called, their calling syntax, etc - without having to wade through pages of the code that implements the class.
And yes, I know about partial classes and #regions, but it's not the same. Partial classes actually make the problem worse, because a class definition is spread across several files. As far as #regions go, they never seem to be expanded in the manner I'd like for what I'm doing at the moment, so I have to spend time expanding those little plus's until I get the view right.
Perhaps if Visual Studio's intellisense worked better for C++, I wouldn't have a compelling reason to have to refer to .h files so often, but even in VS2008, C++'s intellisense can't touch C#'s
C was made to make writing a compiler easily. It does a LOT of stuff based on that one principle. Pointers only exist to make writing a compiler easier, as do header files. Many of the things carried over to C++ are based on compatibility with these features implemented to make compiler writing easier.
It's a good idea actually. When C was created, C and Unix were kind of a pair. C ported Unix, Unix ran C. In this way, C and Unix could quickly spread from platform to platform whereas an OS based on assembly had to be completely re-written to be ported.
The concept of specifying an interface in one file and the implementation in another isn't a bad idea at all, but that's not what C header files are. They are simply a way to limit the number of passes a compiler has to make through your source code and allow some limited abstraction of the contract between files so they can communicate.
These items, pointers, header files, etc... don't really offer any advantage over another system. By putting more effort into the compiler, you can compile a reference object as easily as a pointer to the exact same object code. This is what C++ does now.
C is a great, simple language. It had a very limited feature set, and you could write a compiler without much effort. Porting it is generally trivial! I'm not trying to say it's a bad language or anything, it's just that C's primary goals when it was created may leave remnants in the language that are more or less unnecessary now, but are going to be kept around for compatibility.
It seems like some people don't really believe that C was written to port Unix, so here: (from)
The first version of UNIX was written
in assembler language, but Thompson's
intention was that it would be written
in a high-level language.
Thompson first tried in 1971 to use
Fortran on the PDP-7, but gave up
after the first day. Then he wrote a
very simple language he called B,
which he got going on the PDP-7. It
worked, but there were problems.
First, because the implementation was
interpreted, it was always going to be
slow. Second, the basic notions of B,
which was based on the word-oriented
BCPL, just were not right for a
byte-oriented machine like the new
PDP-11.
Ritchie used the PDP-11 to add types
to B, which for a while was called NB
for "New B," and then he started to
write a compiler for it. "So that the
first phase of C was really these two
phases in short succession of, first,
some language changes from B, really,
adding the type structure without too
much change in the syntax; and doing
the compiler," Ritchie said.
"The second phase was slower," he said
of rewriting UNIX in C. Thompson
started in the summer of 1972 but had
two problems: figuring out how to run
the basic co-routines, that is, how to
switch control from one process to
another; and the difficulty in getting
the proper data structure, since the
original version of C did not have
structures.
"The combination of the things caused
Ken to give up over the summer,"
Ritchie said. "Over the year, I added
structures and probably made the
compiler code somewhat better --
better code -- and so over the next
summer, that was when we made the
concerted effort and actually did redo
the whole operating system in C."
Here is a perfect example of what I mean. From the comments:
Pointers only exist to make writing a compiler easier? No. Pointers exist because they're the simplest possible abstraction over the idea of indirection. – Adam Rosenfield (an hour ago)
You are right. In order to implement indirection, pointers are the simplest possible abstraction to implement. In no way are they the simplest possible to comprehend or use. Arrays are much easier.
The problem? To implement arrays as efficiently as pointers you have to pretty much add a HUGE pile of code to your compiler.
There is no reason they couldn't have designed C without pointers, but with code like this:
int i=0;
while(src[++i])
dest[i]=src[i];
it will take a lot of effort (on the compilers part) to factor out the explicit i+src and i+dest additions and make it create the same code that this would make:
while(*(dest++) = *(src++))
;
Factoring out that variable "i" after the fact is HARD. New compilers can do it, but back then it just wasn't possible, and the OS running on that crappy hardware needed little optimizations like that.
Now few systems need that kind of optimization (I work on one of the slowest platforms around--cable set-top boxes, and most of our stuff is in Java) and in the rare case where you might need it, the new C compilers should be smart enough to make that kind of conversion on its own.
In The Design and Evolution of C++, Stroustrup gives out one more reason...
The same header file can have two or more implementation files which can be simultaneously worked-upon by more than one programmer without the need of a source-control system.
This might seem odd these days, but I guess it was an important issue when C++ was invented.
If you want C++ without header files then I have good news for you.
It already exists and is called D (http://www.digitalmars.com/d/index.html)
Technically D seems to be a lot nicer than C++ but it is just not mainstream enough for use in many applications at the moment.
One of C++'s goals is to be a superset of C, and it's difficult for it to do so if it cannot support header files. And, by extension, if you wish to excise header files you may as well consider excising CPP (the pre-processor, not plus-plus) altogether; both C# and Java do not specify macro pre-processors with their standards (but it should be noted in some cases they can be and even are used even with these languages).
As C++ is designed right now, you need prototypes -- just as in C -- to statically check any compiled code that references external functions and classes. Without header files, you would have to type out these class definitions and function declarations prior to using them. For C++ not to use header files, you'd have to add a feature in the language that would support something like Java's import keyword. That'd be a major addition, and change; to answer your question of if it'd be practical: I don't think so--not at all.
Many people are aware of shortcomings of header files and there are ideas to introduce more powerful module system to C++.
You might want to take a look at Modules in C++ (Revision 5) by Daveed Vandevoorde.
Well, C++ per se shouldn't eliminate header files because of backwards compatibility. However, I do think they're a silly idea in general. If you want to distribute a closed-source lib, this information can be extracted automatically. If you want to understand how to use a class w/o looking at the implementation, that's what documentation generators are for, and they do a heck of a lot better a job.
There is value in defining the class interface in a separate component to the implementation file.
It can be done with interfaces, but if you go down that road, then you are implicitly saying that classes are deficient in terms of separating implementation from contract.
Modula 2 had the right idea, definition modules and implementation modules. http://www.modula2.org/reference/modules.php
Java/C#'s answer is an implicit implementation of the same (albeit object-oriented.)
Header files are a kludge, because header files express implementation detail (such as private variables.)
In moving over to Java and C#, I find that if a language requires IDE support for development (such that public class interfaces are navigable in class browsers), then this is maybe a statement that the code doesn't stand on its own merits as being particularly readable.
I find the mix of interface with implementation detail quite horrendous.
Crucially, the lack of ability to document the public class signature in a concise well-commented file independent of implementation indicates to me that the language design is written for convenience of authorship, rather convenience of maintenance. Well I'm rambling about Java and C# now.
One advantage of this separation is that it is easy to view only the interface, without requiring an advanced editor.
No language exists without header files. It's a myth.
Look at any proprietary library distribution for Java (I have no C# experience to speak of, but I'd expect it's the same). They don't give you the complete source file; they just give you a file with every method's implementation blanked ({} or {return null;} or the like) and everything they can get away with hiding hidden. You can't call that anything but a header.
There is no technical reason, however, why a C or C++ compiler could count everything in an appropriately-marked file as extern unless that file is being compiled directly. However, the costs for compilation would be immense because neither C nor C++ is fast to parse, and that's a very important consideration. Any more complex method of melding headers and source would quickly encounter technical issues like the need for the compiler to know an object's layout.
If you want the reason why this will never happen: it would break pretty much all existing C++ software. If you look at some of the C++ committee design documentation, they looked at various alternatives to see how much code it would break.
It would be far easier to change the switch statement into something halfway intelligent. That would break only a little code. It's still not going to happen.
EDITED FOR NEW IDEA:
The difference between C++ and Java that makes C++ header files necessary is that C++ objects are not necessarily pointers. In Java, all class instances are referred to by pointer, although it doesn't look that way. C++ has objects allocated on the heap and the stack. This means C++ needs a way of knowing how big an object will be, and where the data members are in memory.
Header files are an integral part of the language. Without header files, all static libraries, dynamic libraries, pretty much any pre-compiled library becomes useless. Header files also make it easier to document everything, and make it possible to look over a library/file's API without going over every single bit of code.
They also make it easier to organize your program. Yes, you have to be constantly switching from source to header, but they also allow you define internal and private APIs inside the implementations. For example:
MySource.h:
extern int my_library_entry_point(int api_to_use, ...);
MySource.c:
int private_function_that_CANNOT_be_public();
int my_library_entry_point(int api_to_use, ...){
// [...] Do stuff
}
int private_function_that_CANNOT_be_public() {
}
If you #include <MySource.h>, then you get my_library_entry_point.
If you #include <MySource.c>, then you also get private_function_that_CANNOT_be_public.
You see how that could be a very bad thing if you had a function to get a list of passwords, or a function which implemented your encryption algorithm, or a function that would expose the internals of an OS, or a function that overrode privileges, etc.
Oh Yes!
After coding in Java and C# it's really annoying to have 2 files for every classes. So I was thinking how can I merge them without breaking existing code.
In fact, it's really easy. Just put the definition (implementation) inside an #ifdef section and add a define on the compiler command line to compile that file. That's it.
Here is an example:
/* File ClassA.cpp */
#ifndef _ClassA_
#define _ClassA_
#include "ClassB.cpp"
#include "InterfaceC.cpp"
class ClassA : public InterfaceC
{
public:
ClassA(void);
virtual ~ClassA(void);
virtual void methodC();
private:
ClassB b;
};
#endif
#ifdef compiling_ClassA
ClassA::ClassA(void)
{
}
ClassA::~ClassA(void)
{
}
void ClassA::methodC()
{
}
#endif
On the command line, compile that file with
-D compiling_ClassA
The other files that need to include ClassA can just do
#include "ClassA.cpp"
Of course the addition of the define on the command line can easily be added with a macro expansion (Visual Studio compiler) or with an automatic variables (gnu make) and using the same nomenclature for the define name.
Still I don't get the point of some statements. Separation of API and implementation is a very good thing, but header files are not API. There are private fields there. If you add or remove private field you change implementation and not API.