How to generate source file with some preprocessor #define applied - c++

I have some C/C++ source file (.hpp,.cpp) containing something like
...
#define SOME_DEFINE(t) some_ns::some_type<t>
...
// define is somehow used later in the code
I would like to have a modified source (for readability) to have all the SOME_DEFINE(t) to be substituted in this file.
So I'm definitely not willing to apply a preprocessor compiler step - only this #define substituted and only for this source file.

You have several options..
Run the preprocessor and store the output. With gcc it is -E to get the output after preprocessor. Depedning on how much other preprocessing is used in the sources, this might be viable or not.
Use a regex to search and replace.
Use a template alias template <typename T> SOME_DEFINE = some_ns::some_type<T>; . Then use search and replace via regex to replace SOME_DEFINE(t) with SOME_DEFINE<T>.
Find a tool that does it out of the box. Actually I am not aware of one, though tool recomenadations are offtopic anyhow. In a comment https://dotat.at/prog/unifdef/ was metioned.

Related

Flex C++ - #ifdef inside flex block

I want to define constant in preprocessor which launches matching some patterns only when it's defined. Is it possible to do this, or there is the other way how to deal with this problem?
i.e. simplified version of removing one-line comments in C:
%{
#define COMMENT
%}
%%
#ifdef COMMENT
[\/][\/].*$ ;
#endif
[1-9][0-9]* printf("It's a number, and it works with and without defining COMMENT");
%%
There is no great solution to this (very reasonable) request, but there are some possibilities.
(F)lex start conditions
Flex start conditions make it reasonably simple to define a few optional patterns, but they don't compose well. This solution will work best if you have only a single controlling variable, since you will have ti define a separate start condition for every possible combination of controlling variables.
For example:
%s NO_COMMENTS
%%
<NO_COMMENTS>"//".* ; /* Ignore comments in `NO_COMMENTS mode. */
The %s declaration means that all unmarked rules also apply to the N_COMMENTS state; you will commonly see %x ("exclusive") in examples, but that would force you to explicitly mark almost every rule.
Once you have modified you grammar in this way, you can select the appropriate set of rules at run-time by setting the lexer's state with BEGIN(INITIAL) or BEGIN(NO_COMMENTS). (The BEGIN macro is only defined in the flex generated file, so you will want to export a function which performs one of these two actions.)
Using cpp as a utility.
There is no preprocessor feature in flex. It's possible that you could use a C preprocessor to preprocess your flex file before passing it to flex, but you will have to be very careful with your input file:
The C preprocessor expects its input to be a sequence of valid C preprocessor tokens. Many common flex patterns will not match this assumption, because of the very different quoting rules. (For a simple example, a common pattern to recognise C comments includes the character class [^/*] which will be interpreted by the C preprocessor as containing the start of a C comment.)
The flex input file is likely to have a number of lines which are valid #include directives. There is no way to avoid these directives from being expanded (other than removing them from the file). Once expanded and incorporated into the source, the header files no longer have include guards, so you will have to tell flex not to insert any #include files from its own templates. I believe that is possible, but it will be a bit fragile.
The C preprocessor may expand what looks to it like a macro invocation.
The C preprocessor might not preserve linear whitespace, altering the meaning of the flex scanner definition.
m4 and other preprocessors
It would be safer to use m4 as a preprocessor, but of course that means learning m4. ( You shouldn't need to install it because flex already depends on it. So if you have flex you also have m4.) And you will still need to be very careful with quoting sequences. M4 lets you customize these sequences, so it is more manageable than cpp. But don't copy the common idiom of defining [[ as a quote delimiter; it is very common inside regular expressions.
Also, m4 does not insert #line directives and any non-trivial use will change the number of input lines, making error messages harder to interpret. (To say nothing of the challenge of debugging.) You can probably avoid this issue in this very simple case but the issue will reappear.
You could also write your own simple preprocessor, but you will still need to address the above issues.

Expanding macros for debugging?

I'm new to using macro functions and I understand there are some pitfalls in their use when it comes to order of operations. Is there a way to expand the macro after the preprocessor goes through it so I can see what it looks like?
In VS2017, I've tried Processor > C/C++ > Preprocessor > Preprocess to a file which creates an *.i file but it's around 50k lines long and I can't seem to find where my macro was expanded to.
edit: I know macros are bad news bears, however, the code base I'm stepping into uses them quite a bit so I'm trying to better understand them.
In VS2017, I've tried Processor > C/C++ > Preprocessor > Preprocess to a file which creates an *.i file but it's around 50k lines long and I can't seem to find where my macro was expanded to.
You can help yourself by declaring a dummy variable before the line where a macro is used.
E.g.
extern int dummyIntVariable;
MY_COMPLICATED_MACRO(arg1, arg2);
After that, you look for dummyIntVariable in the .i file. The line below it will contain what MY_COMPLICATED_MACRO expands to.
Or as #Sneftel pointed out in a comment, you can use any old string that helps you navigate through the .i file.
THIS IS A UNIQUE STRING
MY_COMPLICATED_MACRO(arg1, arg2);
Since the file will be just pre-processed, that should also work.

What is this C++ language construct: # (i.e. hash) integer "path_to_header_or_cpp_file" <integer>?

I came across the following code in a .cpp file. I do not understand the construct or syntax which involves the header files. I do recognize that these particular header files relate to Android NDK. But, I think the question is a general question about C++ syntax.
These appear to be preprocessor commands in some way because they begin with "#". But, they are not the typical #include, #pragma, #ifndef, #define, etc. commands. The source file has more 1000+ such occurrences referencing hundreds of different .h, .c, .cpp files.
typedef int __time_t;
typedef int __timer_t;
# 116 "/home/usr/download/android-ndk-r8b/platforms/android-3/arch-arm/usr/include/machine/_types.h"
# 41 "/home/usr/download/android-ndk-r8b/platforms/android-3/arch-arm/usr/include/sys/_types.h" 2
# 33 "/home/usr/download/android-ndk-r8b/platforms/android-3/arch-arm/usr/include/stdint.h" 2
# 48 "/home/usr/download/android-ndk-r8b/platforms/android-3/arch-arm/usr/include/stdint.h"
typedef __int8_t int8_t;
typedef __uint8_t uint8_t;
The compiler (GCC) does not appear to be throwing any error related to these lines. But, I would like to understand their purpose and function. Can anybody explain these?
This is output from the GCC preprocessor. Those lines are known as linemarkers. They have the syntax:
# linenum filename flags
They are interpreted as saying that the following line has come from the line linenum from filename. They basically just help you and the compiler see where lines were included from. The flags provide some more information:
1 - This indicates the start of a new file.
2 - This indicates returning to a file (after having included another file).
3 - This indicates that the following text comes from a system header file, so certain warnings should be suppressed.
4 - This indicates that the following text should be treated as being wrapped in an implicit extern "C" block.
You can see this output from preprocessing your own programs if you give the -E flag to g++.
You'll typically see lines like that in the output of the preprocessor (i.e., you normally shouldn't be seeing them at all).
They're similar to the standard #line directive, which has the form:
#line 42
or
#line 42 "foo.c"
which the compiler uses to control the contents of error messages.
Without the word line, this:
# 42 "foo.c"
is technically a non-directive (which, just to add to the fun, is a kind of directive). It's essentially a comment as far as the C standard is concerned. At a guess, gcc's preprocessor probably emits these rather than #line directives because #line directives are intended as input to the preprocessor.
gcc's preprocessor refers to these as "linemarkers"; they're discussed in the cpp manual. They're treated like #line directives, except that they can take an additional flag argument.
The preprocessors tend to introduce these directives and use them to indicate the line and filename. The C++ doesn't define the meaning but it reserves the use of
# <non-directive>
where is something which isn't one of the normal directives. It seems compiler writes have agreed to use the line number and filename in these as the result of preprocessing the file. This use is similar to basically all compilers supporting the -E option to indicate that the file(s) should just be processed.

Preprocessor and whitespaces rules

I am interested in defining my own language inside a C++ block (lets say for example main) and for that purpose I need to use the preprocessor and its directives my problem relies to the below rule:
#define INSERT create() ...
Is called a function-like definition and preprocessor does not allow any whitespaces in what we define ,
So when I use a function of my own language I got to parse right handy the below statement:
INSERT INTO variable_name VALUES(arg_list)
to a different two function calls lets say
insertINTO(variable_name) and valuePARSE(arg_list)
but since the preprocessor directive rules do not allow me to have whitespaces in my definition how I can reach the variable_name and then make the call to the first function call I want to achieve?
Any clues would be helpful.
PS: I tried using g++ -E file.cpp to see how preprocessor works and to adjust the syntax to be valid c++ rules.
The preprocessor included with most C++ compilers is probably way too weak for this kind of task. It was never designed for this kind of abuse. The boost preprocessor library could help you on the way, but I still think you're heading down a one-way street here.
If you really want to define your language this way, I suggest you either write your own preprocessor, or use one that is more powerful than the default one. Here is one chap who tried using Python as a C++ preprocessor.
1) define INSERT create() is not a function-like macro it's object-like, something like define INSERT(a, b, c) create(a, b, c) would be;
2) if you want to expand INSERT INTO variable_name VALUES(arg_list) into insertINTO(variable_name); valuePARSE(arg_list); you can do something like:
#define INSERT insertINTO(
#define INTO
#define VALUES(...) ); valueParse(__VA_ARGS__);
3) as you can see macros get ugly pretty easy and even the slightest error in your syntax will have you spend a lot of time tracking it down
4) since it's tagged C++ take a look at Boost.Proto or Boost.Preprocessor.

Preprocessor directives

When we see #include <iostream>, it is said to be a preprocessor directive.
#include ---> directive
And, I think:
<iostream> ---> preprocessor
But, what is meant by "preprocessor" and "directive"?
It may help to think of the relationship between a "directive" and being "given directions" (i.e. orders). "preprocessor directives" are directions to the preprocessor about changes it should make to the code before the later stages of compilation kick in.
But, what's the preprocessor? Well, its name reflects that it processes the source code before the "main" stages of compilation. It's simply there to process the textual source code, modifying it in various ways. The preprocessor doesn't even understand the tokens it operates on - it has no notion of types or variables, classes or functions - it's all just quoted- and/or parentheses- grouped, comma- and/or whitespace separated text to be manhandled. This extra process gives more flexibility in selecting, combining and even generating parts of the program.
EDIT addressing #SWEngineer's comment: Many people find it helpful to think of the preprocessor as a separate program that modifies the C++ program, then gives its output to the "real" C++ compiler (this is pretty much the way it used to be). When the preprocessor sees #include <iostream> it thinks "ahhha - this is something I understand, I'm going to take care of this and not just pass it through blindly to the C++ compiler". So, it searches a number of directories (some standard ones like /usr/include and wherever the compiler installed its own headers, as well as others specified using -I on the command line) looking for a file called "iostream". When it finds it, it then replaces the line in the input program saying "#include " with the complete contents of the file called "iostream", adding the result to the output. BUT, it then moves to the first line it read from the "iostream" file, looking for more directives that it understands.
So, the preprocessor is very simple. It can understand #include, #define, #if/#elif/#endif, #ifdef and $ifndef, #warning and #error, but not much else. It doesn't have a clue what an "int" is, a template, a class, or any of that "real" C++ stuff. It's more like some automated editor that cuts and pastes parts of files and code around, preparing the program that the C++ compiler proper will eventually see and process. The preprocessor is still very useful, because it knows how to find parts of the program in all those different directories (the next stage in compilation doesn't need to know anything about that), and it can remove code that might work on some other computer system but wouldn't be valid on the one in use. It can also allow the program to use short, concise macro statements that generate a lot of real C++ code, making the program more manageable.
#include is the preprocessor directive, <iostream> is just an argument supplied in addition to this directive, which in this case happens to be a file name.
Some preprocessor directives take arguments, some don't, e.g.
#define FOO 1
#ifdef _NDEBUG
....
#else
....
#endif
#warning Untested code !
The common feature is that they all start with #.
In Olden Times the preprocessor was a separate tool which pre-processed source code before passing it to the compiler front-end, performing macro substitutions and including header files, etc. These days the pre-processor is usually an integral part of the compiler, but it essentially just does the same job.
Preprocessor directives, such as #define and #ifdef, are typically used to make source programs easy to change and easy to compile in different execution environments. Directives in the source file tell the preprocessor to perform specific actions. For example, the preprocessor can replace tokens in the text, insert the contents of other files into the source file...
#include is a preprocessor directive meaning that it is use by the preprocessor part of the compiler. This happens 'before' the compilation process. The #include needs to specify 'what' to include, this is supplied by the argument iostream. This tells the preprocessor to include the file iostream.h.
More information:
Preprocessor Directives on MSDN
Preprocessor directives on cplusplus.com