How does C++ preprocessor work? - c++

I am reading the book "C++ Primer" 5th Edition and I read that the preprocessor is a program that runs before the C++ compiler and replaces the #include, #define and #ifdefs and others with the appropriate content and then transfer control over to the compiler.
But I came across a way in cl.exe (Microsoft Compiler) to view the preprocessor output saved directly to file. I did it, and when I opened the preprocessor output file I was surprised because I did not find what I expected!
They were totally big and contained what looked like obfuscated code!
Please Explain what in reality does the Pre-Processor of C++ does.

It is entirely possible to pre-process Java just like you do C or C++. Just use something like this:
gcc -E myjava.java > myjava.preprocesses.java
Then you can use macro expansion, #if etc to your hearts content. Of course, it does have the drawback that there is a further tool needed for the compile.

You can roll out a JNI lib that ties in with your native C/C++ code that has all your necessary macros.

Related

Accessing tokenization of a C++ source file

My understanding is that one step of the compilation of a program (irrespective of the language, I guess) is parsing the source file into some kind of space separated tokens (this tokenization would be made by what's referred to as scanner in this answer. For instance I understand that at some point in the compilation process, a line containing x += fun(nullptr); is separated is something like
x
+=
fun
(
nullptr
)
;
Is this true? If so, is there a way to have access to this tokenization of a C++ source code?
I'm asking this question mostly for curiosity, and I do not intend to write a lexer myself
And the reason I'm curious to know whether one can leverage the compiler is that, to give an example, before meeting [[noreturn]] & Co. I wouldn't have ever considered [[ as a valid token, if I was to write a lexer myself.
Do we necessarily need a true, actual use case? I think we don't, if I am curious about whether there's an existing tool or not to do something.
However, if we really need a use case,
let's say my target is to write a C++ function which reads in a C++ source file and returns a std::vector of the lexemes it's made up of. Clearly, a requirement is that concatenating the elments of the output should make up the whole text again, including line breakers and every other byte of it.
With the restriction mentioned in the comment (tokenization keeping __DATE__) it seems rather manageable. You need the preprocessing tokens. The Boost::Wave preprocessor necessarily creates a token list, because it has to work on those tokens.
Basile correctly points out that it's hard to assign a meaning to those tokens.
C++ is a very complex programming language.
Be sure to read the C++11 draft standard n3337 before even attempting to parse C++ code.
Look inside the source code of existing open source C++ compilers, such as GCC (at least GCC 10 in October 2020) or Clang (at least Clang 10 in October 2020)
If you have to write your C++ parser from scratch, be sure to have the budget for at least a full person year of work.
Look also into existing C++ static source code analyzers, such as Frama-C++ or Clang static analyzer. Consider adapting one of them to your needs, but do document in writing your needs before starting coding. Be aware of Rice's theorem.
If you want to parse a small subset of C++ (you'll need to document and specify that subset), consider using parser generators like ANTLR or GNU bison.
Most compilers are building some internal representations, in particular some abstract syntax tree. Read the Dragon book for more.
I would suggest instead writing your own GCC plugin.
Indeed, it would be tied to some major version of GCC, but you'll win months of work.
Is this true? If so, is there a way to have access to this tokenization of a C++ source code?
Yes, by patching some existing opensource C++ compiler, or extending it with your plugin (there are licensing conditions related to both approaches).
let's say my target is to write a C++ function which reads in a C++ source file and returns a std::vector of the lexemes it's made up of.
The above specification is ambiguous.
Do you want the lexeme before or after the C++ preprocessing phase? In other words, what would be the lexeme for e.g. __DATE__ or __TIME__ ? Read e.g. the documentation of GNU cpp ... If you happen to use GCC on Linux (see gcc(1)) and have some C++ translation unit foo.cc, try running g++ -C -E -Wall foo.cc > foo.ii and look (using less(1)...) into the generated preprocessed form foo.ii ? And what about template expansion, or preprocessor conditionals or preprocessor stringizing ?
I would suggest writing your GCC plugin working on GENERIC representations. You could also start a PhD work related to your goals.
Notice that generating C++ code is a lot easier than parsing it.
Look inside Qt for an example of software generating C++ code. Yo could consider using GNU m4, or GNU gawk, or GNU autoconf, or GPP, or your own C++ source generator (perhaps with the help of GNU bison or of ANTLR) to generate some of your C++ code.
PS. On my home page you'll find an hyperlink to some draft report related to your question, and another hyperlink to an open source program generating C++ code. It sadly seems that I am forbidden here to give these hyperlinks, but you could find them in two mouse clicks. You might also look into two European H2020 projects funding that draft report: CHARIOT & DECODER.

How to apply a C preprocessor only to certain (#if/#endif) directives?

I was wondering if it is possible, and if yes how, can I run a C preprocessor, like cpp, on a
C++ source file and only process the conditional directives #if #endif etc. I would like other
directives to stay intact in the output file.
I'm doing some analysis on C# code and there is no C# pre-processor. My idea is to run a C preprocessor on C# file and process only conditionals. This way for example, the #region directive, will stay
in the file, but cpp appears to remove #region.
You might be looking for a tool like coan:
Coan is a software engineering tool for analysing preprocessor-based configurations of C or C++ source code. Its principal use is to simplify a body of source code by eliminating any parts that are redundant with respect to a specified configuration.
It's precisely designed to process #if and #ifdef preprocessor lines, and remove code accordingly, but it has a lot of other possible uses.
The linux unifdef command does what you want:
http://linux.die.net/man/1/unifdef
Even if you're not on linux, there is source available on the web.
BTW, this is a duplicate of another question: Way to omit undefined preprocessor branches by default with unifdef?
Oh, this is the same task as I had in the past. I've tried cpp unifdef and coan tools - all of them stumbled upon special C# preprocessor things like #region. In the end I've decided to make my own one:
https://github.com/gaDZella/undefine.
The tool has a pretty simple set of options compared to the mentioned cpp tools but it is fully compatible with C# preprocessor syntax.
You can use g++ -E option to stop after preprocessing stage
-E -> stop after the preprocessing stage.The output is in the form of preprocessed source code, which is sent to the standard output

Is there a tool to strip off code under switches defined?

I have a chunk of C and C++ code. I have to release the source to two different customers. I don't want them to see what the features others having. So, I'm planning to use switches for compilation. When to deliver the code, I would like to have few lines of the code stripped out. I know I can write a script to do it but I would like to know if are there any tools that exist to do this job.
#ifdef CUSTOMER_1
Code for Customer 1
#else //Customer_2
Code for Customer 2
#endif
For customer 2, I would like to have the code removed which comes under #ifdef and #else. I would like to remove the line #endif. Are there any tools that are readily available for this?
There's a utility called unifdef that's designed exactly for this purpose.
Why don't you put the customer-specific bits in a a customers.h file and include that file? You give different customers different customers.h. You haven't fully explained the problem, but by first sniff, static filtering of source code is a kludge.
The C preprocessor can do this for you of course, but it will process any other directives too. I think you would be best to do this with a specialised script

Parsing/debugging/porting C++ program with lots of macros

I'm just curious if anybody could help with finding good tool for the task.
I have large program in C/C++ which needs to be ported from Win32 to Linux. As "wrapping" (i.e. the most OS-sensitive) part was successfully isolated from insides of the program this task only involves going through its "internals". Some things works, some cause small compile time problems but there is one HUGE inconvenient part - macro usage.
Basically most of the internals look like this:
START_MAIN( ... )
SOME_MACRO( ... )
ANOTHER_MACRO( ... )
WRITE_SOMETHING()
END_MAIN()
This makes C/C++ look like Pascal but gives also lots of terrible pain while trying to figure out "what's wrong".
Are there ANY tools to help with parsing this kind of sources to get to the root of problems?
I'm slowly (manually) approaching "compilability" of this program but anything what could help me see through this (artificially structured) mess would be really appreciated.
If you need to manually adjust the compiling, output and stuff (i.e. looking for a customizable C++ parser), clang is a good tool to start with.
In case you just want to see the preprocessed code (with macros expanded), you can use compiler flags:
MSVC: add /P to C++ compiler flags (Project -> Properties -> C/C++ -> Command Line)
GCC, Clang: Add the -E compiler flag
This question about preprocessing C++ code contains some answers you might find useful.
Eclipse CDT will expand macros and even show you how many macro-evaluations were required to reach the final code that would be emitted by the preprocessor.
http://help.eclipse.org/galileo/index.jsp?topic=/org.eclipse.cdt.doc.user/concepts/cdt_c_whatsnew.htm
Look at the "Macro Exploration" section.

Convert Fortran to C or C++

I have some numeric code that I need to convert to C or C++. I tried using f2c, but it won't work on the Fortran code. f2c complains because the code uses C style preprocessor directives (#include).
The code's readme states that it is Fortran77, that works with the fort77 linker, that would expand those includes.
Does anyone know how to successfully convert this code?
My last resort is to write a simple preprocessor to expand those includes and then feed the code to f2c.
Note: I´m working in a Windows/Visual C++ environment here, so any gcc shenanigans would probably be more trouble than they are worth...
I worked in an engineering research group for many years. We never had good luck doing automated conversions from Fortran to C. The code was not particularly understandable and it was hard to maintain. Our best luck was using the Fortran code as a template for the algorithm and doing a re-implementation for anything that we expected to continue using. Alternatively, you could update the Fortran to use more modern Fortran constructs and get a lot of the same value that you would get from moving to C/C++. We also had some success calling Fortran routines from C, although the calling convention differences sometimes made things difficult.
This might be is going out a bit on a limb, but have you considered that since perhaps you're using C-style includes, you could actually run the C preprocessor on the file in order to include those files? Then, you could take that output and run it through f2c.
(I am not an expert on the matter. Please downvote this if appropriate.)
You might get away with manually converting the
#include "whatsit.f90"
to
INCLUDE 'whatsit.f90'
and then attempting the f2c conversion again.
Have you no C pre-processor? On Unix, there might be a separate program, cpp, that would take the Fortran with #include directives and convert that into Fortran without #include directives. Alternatively, you could rename the source from xyz.f77 (xyz.f) to xyz.c, then run the C compiler in 'pre-processor only' mode and capture the output as the new input to the f2c program. You might have to worry about the options that eliminate #line directives in the output, etc, or you might be better off running the output through a simple filter (sed or perl spring to mind).