When reading this document, at the end of it, there is one sentence:
Historically, compilers for many languages, including C++ and Fortran, have been implemented as “preprocessors” which emit another high level language such as C.
Have no idea about preprocessors, any document? Does it mean all these languages will be translated into C source codes?
I think it would have been better to use the term source-to-source translator instead of "preprocessors" which makes it ambiguous in meaning , but it ain't any wrong to use it either.
Basically , A compiler is a computer program translates source code from a high-level programming language to a lower level language (e.g., assembly language or machine code).But the document in the question says :
Historically, compilers for many languages, including C++ and Fortran,
have been implemented as “preprocessors” which emit another high level
language such as C.
As per this description , it can be said that earlier , the compilers were implemented as source-to-source translator . A translator is also a form of preprocessor but its different from the preprocessor used in a program.
A translator is a computer program that translates a program written
in a given programming language into a functionally equivalent program
in a different language.
Now, coming to preprocessor used in a program , lets take an example :
#include <stdio.h>// a PREPROCESSOR directive
A preprocessor is a program that processes a source file before the
main compilation takes place,( similar to a translator ) but the difference lies in the fact that HERE it handles directives whose names begin with #.
Here #include is a directive. This directive causes the preprocessor to add the contents of the stdio.h file to your
program.This is a typical preprocessor action: adding or replacing text in the source code
before it’s compiled.
Some languages have been implemented by having the compiler generate C code which is then compiled by the C compiler. Notable examples include:
C++ in the earliest days (and C with Classes before that) — cfront generated C code from the C++ code. It ceased to be practical once C++ supported exceptions (read Stroustrup The Design and Evolution of C++ for more information), but not all C++ compilers used the technique (in fact, I don't know of any other compiler than cfront that did it).
Yacc is compiled to C code. Bison can be compiled to C or C++ code.
Lex is compiled to C code. Flex can be compiled to C or C++ code, I believe.
Informix ESQL/C converts Embedded SQL into pure C.
Informix 4GL converts I4GL source into ESQL/C, and then uses the ESQL/C compiler to create C code (and the C compiler to create object code and executables), so it has a multi-stage compiler (and I'm simplifying a bit).
The phrase "preprocessor" now has a totally different meaning, and is confusing to be used here. But, yes, here it means some compilers emits its source to another language.
It should be called source to source compiler. One of the examples is Cfront (designed by Bjarne Stroustrup himself), which converted C++ to C.
For the normal meaning of the phrase "preprocessor" in C++, see here.
No. Not necessarily. Many C++ compilers, LIKE THE GCC DOCUMENT SAID, (but not gcc/g++) produce C code output. Why do they do this? So they can piggyback on all the backend executable code that C compilers can compile to (X86, AMD, etc.) By having C as their destination code,they save alot of low end coding on the back end. Such compilers include the original Cfront and Comeau C/C++.
Related
My understanding is that one step of the compilation of a program (irrespective of the language, I guess) is parsing the source file into some kind of space separated tokens (this tokenization would be made by what's referred to as scanner in this answer. For instance I understand that at some point in the compilation process, a line containing x += fun(nullptr); is separated is something like
x
+=
fun
(
nullptr
)
;
Is this true? If so, is there a way to have access to this tokenization of a C++ source code?
I'm asking this question mostly for curiosity, and I do not intend to write a lexer myself
And the reason I'm curious to know whether one can leverage the compiler is that, to give an example, before meeting [[noreturn]] & Co. I wouldn't have ever considered [[ as a valid token, if I was to write a lexer myself.
Do we necessarily need a true, actual use case? I think we don't, if I am curious about whether there's an existing tool or not to do something.
However, if we really need a use case,
let's say my target is to write a C++ function which reads in a C++ source file and returns a std::vector of the lexemes it's made up of. Clearly, a requirement is that concatenating the elments of the output should make up the whole text again, including line breakers and every other byte of it.
With the restriction mentioned in the comment (tokenization keeping __DATE__) it seems rather manageable. You need the preprocessing tokens. The Boost::Wave preprocessor necessarily creates a token list, because it has to work on those tokens.
Basile correctly points out that it's hard to assign a meaning to those tokens.
C++ is a very complex programming language.
Be sure to read the C++11 draft standard n3337 before even attempting to parse C++ code.
Look inside the source code of existing open source C++ compilers, such as GCC (at least GCC 10 in October 2020) or Clang (at least Clang 10 in October 2020)
If you have to write your C++ parser from scratch, be sure to have the budget for at least a full person year of work.
Look also into existing C++ static source code analyzers, such as Frama-C++ or Clang static analyzer. Consider adapting one of them to your needs, but do document in writing your needs before starting coding. Be aware of Rice's theorem.
If you want to parse a small subset of C++ (you'll need to document and specify that subset), consider using parser generators like ANTLR or GNU bison.
Most compilers are building some internal representations, in particular some abstract syntax tree. Read the Dragon book for more.
I would suggest instead writing your own GCC plugin.
Indeed, it would be tied to some major version of GCC, but you'll win months of work.
Is this true? If so, is there a way to have access to this tokenization of a C++ source code?
Yes, by patching some existing opensource C++ compiler, or extending it with your plugin (there are licensing conditions related to both approaches).
let's say my target is to write a C++ function which reads in a C++ source file and returns a std::vector of the lexemes it's made up of.
The above specification is ambiguous.
Do you want the lexeme before or after the C++ preprocessing phase? In other words, what would be the lexeme for e.g. __DATE__ or __TIME__ ? Read e.g. the documentation of GNU cpp ... If you happen to use GCC on Linux (see gcc(1)) and have some C++ translation unit foo.cc, try running g++ -C -E -Wall foo.cc > foo.ii and look (using less(1)...) into the generated preprocessed form foo.ii ? And what about template expansion, or preprocessor conditionals or preprocessor stringizing ?
I would suggest writing your GCC plugin working on GENERIC representations. You could also start a PhD work related to your goals.
Notice that generating C++ code is a lot easier than parsing it.
Look inside Qt for an example of software generating C++ code. Yo could consider using GNU m4, or GNU gawk, or GNU autoconf, or GPP, or your own C++ source generator (perhaps with the help of GNU bison or of ANTLR) to generate some of your C++ code.
PS. On my home page you'll find an hyperlink to some draft report related to your question, and another hyperlink to an open source program generating C++ code. It sadly seems that I am forbidden here to give these hyperlinks, but you could find them in two mouse clicks. You might also look into two European H2020 projects funding that draft report: CHARIOT & DECODER.
Have not found the exact question i am asking in either google or here, everything talks about wanting to call c++ from c code or some part being compiled with c compiler and some other with c++ and then later linked together and the problems that arise from that which i do not want.
I want to compile and link C99 files with C++ compiler of Visual Studio in my all C++ application and be able to call the c functions without errors and problems.There will be no c linker involved or compiling some part with different compilers and linking together later, or any kind of trick. The headers are from C library (libcurl) and some others as i want to use them in my application. I do not want to use the C++ bindings i want to compile c code as c++. Can i trust c code be compiled as C++ code without major refactoring? What to do differently than when including C++ headers? What incompatibilities to expect?
In theory, C code should be able to be compiled as C++ code. At some point Dr.Stroustrup made the point that all code from ANSI C edition of the K&R compiles with a C++ compiler and has the same semantics as the code compiled with a C compiler has (this was construed that all ANSI C code would be valid C++ code which is, obviously, not the case, e.g., because many C++ keywords are not reserved identifiers in C).
However, certain idioms in C will require substantial changes to the C code if you want to compile the code with a C++ compiler. A typical example is the need to cast void* to the proper type in C++ which isn't needed in C and it seems it is frowned upon casting the result from malloc() to the proper pointer type although the effect is that it prevents the C code from being compiled with a C++ compiler (in my opinion a good think, e.g., because there the tighter rules may result in discovering problems in the C code even if the production version is being compiled with a C compiler). There are also a few subtle semantic differences as far as I know, although right now I can't easily pin-point one of them. That is, the same code compiled with a C and a C++ compiler may have defined but different results for both cases.
In practice, I doubt that you can simply compile a non-trivial body of C code with a C++ compiler and get a program which behaves the same as the original C code. If the C program you envision to compile with a C++ comes with a thorough set of test cases it may be feasible to port the code to C++ but it will involve more work than merely renaming the file from <name>.c to <name>.cpp. I could imagine that a tool could do the required conversions (a compiler compiling C source to C++ source) but I'm not aware of a such a tool. I'm only aware of the opposite direction yielding entirely unreadable code (for example Comeau C++ uses C as a form of portable assembler).
If you want to do this using visual studio, then it is not possible. MSVC doesn't support C99.
C and C++ are two different, but closely related, languages. C++ is nearly a superset of C++, but not quite (in particular, C++ has keywords that C lacks).
If your code depends on C99 features (i.e., features that are in C99 but not in C90), then you may be out of luck. Microsoft's C compiler does not support C99 (except for a few minor features; I think it permits // comments), and Microsoft has stated clearly that such support is not a priority. You may be able to modify the code so it's valid C90, depending on what features it uses.
Microsoft Visual Studio supports compiling both C and C++ (though it tends to emphasize C++). If you can get your C code compiling with the MS C compiler, I suggest doing just that rather than treating it as C++. C++ has features, particularly extern "C", that are specifically designed to let you interface C and C++ code. The C++ FAQ Lite discusses this in section 32.
If you really need to compile your C code as C++ for some reason, you can probably do so with a few minor source changes. Rename the source file from foo.c to foo.cpp, compile it, and fix any errors that are reported. The result probably won't be good C++, but you should be able to get it to work. There are a few constructs that are valid C and valid C++ with different semantics, but there aren't many of them, and you're not likely to run into them (but you should definitely keep that in mind).
If you want to continue maintaining the code as C++, my advice is to go ahead and make the changes needed to do that, and then stop thinking of it as C code.
The actual need to compile the same code both as C and as C++ is quite rare. (P.J. Plauger, for example, needs to do this, since he provides some libraries intended to be used in either language.) In most cases, C++'s extern "C" and other features are good enough to let you mix the two languages reasonably cleanly.
According to various sources (for example, the SE radio episode with Kevlin Henney, if I remember correctly), "C with classes" was implemented with preprocessor technology (with the output then being fed to a C compiler), whereas C++ has always been implemented with a compiler (that just happened to spit out C in the early days). This seems to cause some confusion, so I was wondering:
Where exactly is the boundary between a preprocessor and a compiler? When do you call a piece of software that implements a language "a preprocessor", and when do you call it "a compiler"?
By the way, is "a compiled language" an established term? If so, what exactly does it mean?
This is an interesting question. I don't know a definitive answer, but would say this, if pressed for one:
A preprocessor doesn't parse the code, but instead scans for embedded patterns and expands them
A compiler actually parses the code by building an AST (abstract syntax tree) and then transforms that into a different language
The language of the output of the preprocessor is a subset of the language of the input.
The language of the output of the compiler is (usually) very different (machine code) then the language of the input.
From a simplified, personal, point of view:
I consider the preprocessor to be any form of textual manipulation that has no concepts of the underlying language (ie: semantics or constructs), and thus only relies on its own set of rules to perform its duties.
The compiler starts when rules and regulation are applied to what is being processed (yes, it makes 'my' preprocessor a compiler, but why not :P), this includes symantical and lexical checking, and the included transforms from x (textual) to y (binary/intermediate form). as one of my professors would say: "its a system with inputs, processes and outputs".
The C/C++ compiler cares about type-correctness while the preprocessor simply expands symbols.
A compiler consist of serval processes (components). The preprocessor is only one of these and relatively most simple one.
From the Wikipedia article, Division of compiler processes:
All but the smallest of compilers have more than two phases. However,
these phases are usually regarded as being part of the front end or
the back end. The point at which these two ends meet is open to
debate.
The front end is generally considered to be where syntactic
and semantic processing takes place, along with translation to a lower
level of representation (than source code).
The middle end is usually
designed to perform optimizations on a form other than the source code
or machine code. This source code/machine code independence is
intended to enable generic optimizations to be shared between versions
of the compiler supporting different languages and target processors.
The back end takes the output from the middle. It may perform more
analysis, transformations and optimizations that are for a particular
computer. Then, it generates code for a particular processor and OS."
Preprocessing is only the small part of the front end job.
The first C++ compiler made by attaching additional process in front of existing C compiler toolset, not because it is good design but because limited time and resources.
Nowadays, I don't think such non-native C++ compiler can survive in the commercial field.
I dare say cfront for C++11 is impossible to make.
The answer is pretty simple.
A preprocessor works on text as input and has text as output. Examples for that are the old unix commands m4, cpp (the C Pre Processor), and also unix programs like roff and nroff and troff which where used (and still are) to format man pages (unix command "man") or format text for printing or typesetting.
Preprocessors are very simple, they don't know anything about the "language of the text" they process. In other words they usually process natural languages. The C preprocessor besides its name, e.g. only recognizes #define, #include, #ifdef, #ifndef, #else etc. and if you use #define MACRO it tries to "expand" that macro everywhere it finds it. But that does not need to be C or C++ program text, it can as well be a novel written in italian or greek.
Compilers that cross compile into a different language are usually called translators. So the old cfront "compiler" for C++ which emitted C code was a C++ translator.
Preprocessors and later translators are historically used because old machines simply lacked memory to be able to do everything in one program, but instead it was done by specialized programs and from disk to disk.
A typical C program would be compiled from various sources. And the build process would be managed with make. In our days the C preprocessor is usually build directly into the C/C++ compiler. A typical make run would call the CPP on the *.c files and write the output to a different directory, from there either the C compiler CC would compile it straight to machine code or more commonly would output assembler code as text. Note: the c compiler only checks syntax, it does not really care about type safety etc. Then the assembler would take that assembler code and would output a *.o file wich later can be linked with other *.o files and *.lib files into an executable program. OTOH you likely had a make rule that would not call the C compiler but the lint command, the C language analyser, which is looking for typical mistakes and errors (which are ignored by the c compiler).
It is quite interesting to look up about lint, nroff, troff, m4 etc. on wikipedia (or your machines terminal using man) ;D
I've seen a lot of arguments over the general performance of C code compiled with a C++ compiler -- I'm curious as to whether there are any solid experimental studies buried beneath all the anecdotal flame wars you find in web searches. I'm particularly interested in the GCC suite, but any data points would be interesting. (Comparing the assembly of "Hello, World!" is not as robust as I'd like. :-)
I'm generally assuming you use the "embedded style" flags -- no exceptions or RTTI. I also wouldn't mind knowing if there are studies on the compilation time itself. TIA!
Adding a datapoint (or at least an anecdote):
We were recently writing a math library for a small embedded-like target, and started writing it in C. About halfway through the project, we switched some of the files to C++, largely in order to use templates for some of the functions where we'd otherwise be writing many nearly-identical pieces of code (or else embedding 40-line functions in preprocessor macros).
At the point where we started switching over, we had a very careful look at the generated assembly code (using GCC) on a number of the functions, and confirmed that it was in fact essentially identical whether the file was compiled as C or C++ -- where by "essentially identical" I mean the differences were in things like symbol names and the stuff at the beginning and end of the assembly file; the actual instructions in the middle of the functions were exactly identical.
Sorry that I don't have a more solid answer.
Edit to add, 2013-03-24: Recently I came across an article where Rusty Russell compared performance on GCC compiled with a C compiler and compiled with a C++ compiler, in response to the recent switch to compiling GCC as C++: http://rusty.ozlabs.org/?p=330. The conclusions are interesting: The version compiled with a C++ compiler was very slightly slower; the difference was about 0.3%. However, that was entirely explained by load time differences caused by larger debug info; when he stripped the binaries and removed the debug info, the differences were less than 0.1% -- i.e., essentially indistinguishable from measurement noise.
I don't know of any studies off-hand, but given the C++ philosophy that you don't pay the price for features you don't use, I doubt there'd be any significant difference between compiling C code with the C compiler and with the C++ compiler.
I don't know of any studies and I doubt that anyone will spend the time to do them. Basically, when compiling with a C++ compiler, the code has the same semantic as when compiling with a C compiler, so it's down to optimization and code generation. But IMO these are much too much compiler-specifc in order to allow any general statements about C vs. C++.
What you mainly gain when you compile C code with a C++ compiler is a much stricter checking (function declarations etc.). IMO this would make compiling C code with a C++ compiler quite attractive. But note that, if you have a large C code base that's never run through a C++ compiler, you're likely facing a very steep up-hill battle until the code compiles clean enough to be able to see any meaningful warnings.
The GCC project is currently under a transition from C to C++ - that is, GCC may be implemented in C++ in the future, it is currently written in C. The next release of GCC will be written in the subset of C which is also valid C++.
Some performance tests were performed on g++ vs gcc, on GCC's codebase. They compared the "bootstrap" time, which means compiling gcc with the sysmem compiler, then compiling it with the resulting compiler, then repeating and checking the results are the same.
Summary: Using g++ was 20% slower. The compiler versions were slightly different, but it was thought that this wouldn't cause there 20% difference.
Note that this measures different programs, gcc vs g++, which although they mostly use the same code, have different front-ends.
I've not tried it from a performance standpoint, but I think compiling C applications with the C++ compiler is a good idea, as it will prevent you from doing "naughty" things such as using functions not declared.
However, the output won't be the same - at the very least, you'll get different symbols, which will render it (mostly) unlinkable with code from the C compiler.
So I think what you really mean is "Is it ok from a performance standpoint, to write C++ code which is very C-like and compile it with the C++ compiler" ?
You would also have to not be using some C99 things such as bool_t which C++ doesn't support on the grounds of having its own ones.
Don't do it if the code has not been designed for. The same valid language constructs can lead to different behavior if interpreted as C or as C++. You would potentially introduce very difficult to understand bugs. Less problematic but still a maintainability nightmare; some C constructs (especially from C99) are not valid in C++.
In the past I have done things like look at the size of the binary which for C++ was huge, that doesnt mean they simply linked in a bunch of unusued libraries. The easiest might be to use gcc -S myprog.cpp vs gcc -S myprog.c and diff the assembler output.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I have some C++ code. In the code there are many classes defined, their member functions, constructors, destructors for those classes, few template classes and lots of C++ stuff. Now I need to convert the source to plain C code.
I the have following questions:
Is there any tool to convert C++ code and header files to C code?
Will I have to do total rewrite of the code (I will have to remove the constructors,destructors and move that code into some init(), deinit() functions; change classes to structures, make existing member functions as function pointers in those newly defined structures and then invoke those functions using function pointers etc..)?
If I have to convert it manually myself, what C++ specific code-data constructs/semantics do I need to pay attention to while doing the conversion from C++ to C?
There is indeed such a tool, Comeau's C++ compiler. . It will generate C code which you can't manually maintain, but that's no problem. You'll maintain the C++ code, and just convert to C on the fly.
http://llvm.org/docs/FAQ.html#translatecxx
It handles some code, but will fail for more complex implementations as it hasn't been fully updated for some of the modern C++ conventions. So try compiling your code frequently until you get a feel for what's allowed.
Usage sytax from the command line is as follows for version 9.0.1:
clang -c CPPtoC.cpp -o CPPtoC.bc -emit-llvm
clang -march=c CPPtoC.bc -o CPPtoC.c
For older versions (unsure of transition version), use the following syntax:
llvm-g++ -c CPPtoC.cpp -o CPPtoC.bc -emit-llvm
llc -march=c CPPtoC.bc -o CPPtoC.c
Note that it creates a GNU flavor of C and not true ANSI C. You will want to test that this is useful for you before you invest too heavily in your code. For example, some embedded systems only accept ANSI C.
Also note that it generates functional but fairly unreadable code. I recommend commenting and maintain your C++ code and not worrying about the final C code.
EDIT : although official support of this functionality was removed, but users can still use this unofficial support from Julia language devs, to achieve mentioned above functionality.
While you can do OO in C (e.g. by adding a theType *this first parameter to methods, and manually handling something like vtables for polymorphism) this is never particularly satisfactory as a design, and will look ugly (even with some pre-processor hacks).
I would suggest at least looking at a re-design to compare how this would work out.
Overall a lot depends on the answer to the key question: if you have working C++ code, why do you want C instead?
Maybe good ol' cfront will do?
A compiler consists of two major blocks: the 'front end' and the 'back end'.
The front end of a compiler analyzes the source code and builds some form of a 'intermediary representation' of said source code which is much easier to analyze by a machine algorithm than is the source code (i.e. whereas the source code e.g. C++ is designed to help the human programmer to write code, the intermediary form is designed to help simplify the algorithm that analyzes said intermediary form easier).
The back end of a compiler takes the intermediary form and then converts it to a 'target language'.
Now, the target language for general-use compilers are assembler languages for various processors, but there's nothing to prohibit a compiler back end to produce code in some other language, for as long as said target language is (at least) as flexible as a general CPU assembler.
Now, as you can probably imagine, C is definitely as flexible as a CPU's assembler, such that a C++ to C compiler is really no problem to implement from a technical pov.
So you have: C++ ---frontEnd---> someIntermediaryForm ---backEnd---> C
You may want to check these guys out: http://www.edg.com/index.php?location=c_frontend
(the above link is just informative for what can be done, they license their front ends for tens of thousands of dollars)
PS
As far as i know, there is no such a C++ to C compiler by GNU, and this totally beats me (if i'm right about this). Because the C language is fairly small and it's internal mechanisms are fairly rudimentary, a C compiler requires something like one man-year work (i can tell you this first hand cause i wrote such a compiler myself may years ago, and it produces a [virtual] stack machine intermediary code), and being able to have a maintained, up-to-date C++ compiler while only having to write a C compiler once would be a great thing to have...
This is an old thread but apparently the C++ Faq has a section (Archived 2013 version) on this. This apparently will be updated if the author is contacted so this will probably be more up to date in the long run, but here is the current version:
Depends on what you mean. If you mean, Is it possible to convert C++ to readable and maintainable C-code? then sorry, the answer is No — C++ features don't directly map to C, plus the generated C code is not intended for humans to follow. If instead you mean, Are there compilers which convert C++ to C for the purpose of compiling onto a platform that yet doesn't have a C++ compiler? then you're in luck — keep reading.
A compiler which compiles C++ to C does full syntax and semantic checking on the program, and just happens to use C code as a way of generating object code. Such a compiler is not merely some kind of fancy macro processor. (And please don't email me claiming these are preprocessors — they are not — they are full compilers.) It is possible to implement all of the features of ISO Standard C++ by translation to C, and except for exception handling, it typically results in object code with efficiency comparable to that of the code generated by a conventional C++ compiler.
Here are some products that perform compilation to C:
Comeau Computing offers a compiler based on Edison Design Group's front end that outputs C code.
LLVM is a downloadable compiler that emits C code. See also here and here. Here is an example of C++ to C conversion via LLVM.
Cfront, the original implementation of C++, done by Bjarne Stroustrup and others at AT&T, generates C code. However it has two problems: it's been difficult to obtain a license since the mid 90s when it started going through a maze of ownership changes, and development ceased at that same time and so it doesn't get bug fixes and doesn't support any of the newer language features (e.g., exceptions, namespaces, RTTI, member templates).
Contrary to popular myth, as of this writing there is no version of g++ that translates C++ to C. Such a thing seems to be doable, but I am not aware that anyone has actually done it (yet).
Note that you typically need to specify the target platform's CPU, OS and C compiler so that the generated C code will be specifically targeted for this platform. This means: (a) you probably can't take the C code generated for platform X and compile it on platform Y; and (b) it'll be difficult to do the translation yourself — it'll probably be a lot cheaper/safer with one of these tools.
One more time: do not email me saying these are just preprocessors — they are not — they are compilers.