I am interested in defining my own language inside a C++ block (lets say for example main) and for that purpose I need to use the preprocessor and its directives my problem relies to the below rule:
#define INSERT create() ...
Is called a function-like definition and preprocessor does not allow any whitespaces in what we define ,
So when I use a function of my own language I got to parse right handy the below statement:
INSERT INTO variable_name VALUES(arg_list)
to a different two function calls lets say
insertINTO(variable_name) and valuePARSE(arg_list)
but since the preprocessor directive rules do not allow me to have whitespaces in my definition how I can reach the variable_name and then make the call to the first function call I want to achieve?
Any clues would be helpful.
PS: I tried using g++ -E file.cpp to see how preprocessor works and to adjust the syntax to be valid c++ rules.
The preprocessor included with most C++ compilers is probably way too weak for this kind of task. It was never designed for this kind of abuse. The boost preprocessor library could help you on the way, but I still think you're heading down a one-way street here.
If you really want to define your language this way, I suggest you either write your own preprocessor, or use one that is more powerful than the default one. Here is one chap who tried using Python as a C++ preprocessor.
1) define INSERT create() is not a function-like macro it's object-like, something like define INSERT(a, b, c) create(a, b, c) would be;
2) if you want to expand INSERT INTO variable_name VALUES(arg_list) into insertINTO(variable_name); valuePARSE(arg_list); you can do something like:
#define INSERT insertINTO(
#define INTO
#define VALUES(...) ); valueParse(__VA_ARGS__);
3) as you can see macros get ugly pretty easy and even the slightest error in your syntax will have you spend a lot of time tracking it down
4) since it's tagged C++ take a look at Boost.Proto or Boost.Preprocessor.
Related
I have a C header file containing various declarations of functions, enums, structs, etc, and I hope to extract all declared function names into a boost.preprocessor data structure for iteration, using only the C preprocessor.
All function declarations have two fixed distinct macros around the return type, something like,
// my_header.h
FOO int * BAR f(long, double);
FOO void BAR g();
My goal is to somehow transform it into one of the above linked boost.preprocessor types, such as (f, g) or (f)(g). I believe it is possible by cleverly defining FOO and BAR, but have not succeeded after trying to play around with boost.preprocessor and P99.
I believe this task can only be done with the preprocessor as,
As a hard requirement, I need to stringify the function names as string literals later when iterating the list, so runtime string manipulation or existing C++ static reflection frameworks with template magic are out AFAIK.
While it can be done with the help of other tools, they are either fragile (awk or grep as ad-hoc parsers) or overly complex for the task (LLVM/GCC plugin for something robust). It is also a motivation to avoid external dependencies other than those strictly necessary i.e. a conforming C compiler.
I don't think this is going to work, due to limitations on where parentheses and commas need to occur.
What you can do, though, is the opposite. You could make a Boost.PP sequence that contains the signatures in some structured form and use it to generate the declarations as you showed them. In the end, you have the representation you want as well as the compiler's view of the declarations.
After some closer look at the internals of preprocessor tricks, I believe this is theoretically impossible. This answer is kind of a more detailed expansion on top of #sehe's nice answer.
The fundamental working principle of arbitrary preprocessor lists like those in boost.preprocessor is indirect recursion. As such, it requires a way to consume one argument and pass the remaining on. The only two ways for CPP are commas (which can separate arguments) and enclosing parentheses (which can invoke macros).
In the case of f(int, long), f is neither followed by a comma nor surrounded by a pair of parenthese, so there is no way for it to be separated from the following list by the preprocessor without knowing the name in advance.
It could have changed the game if there were a BAZ after f, but sadly there is not and I have no control over the said library header :(
There are other issues, albeit not as fatal, such as the UB of having preprocessor directives within macro definition or arguments.
Perhaps someday it would become possible to leverage the reflection TS to get all declared function names within a namespace as a consteval compile-time list and then iterate it with something along the lines of constexpr for, all in a semantic and type-safe manner... who knows
I want to define constant in preprocessor which launches matching some patterns only when it's defined. Is it possible to do this, or there is the other way how to deal with this problem?
i.e. simplified version of removing one-line comments in C:
%{
#define COMMENT
%}
%%
#ifdef COMMENT
[\/][\/].*$ ;
#endif
[1-9][0-9]* printf("It's a number, and it works with and without defining COMMENT");
%%
There is no great solution to this (very reasonable) request, but there are some possibilities.
(F)lex start conditions
Flex start conditions make it reasonably simple to define a few optional patterns, but they don't compose well. This solution will work best if you have only a single controlling variable, since you will have ti define a separate start condition for every possible combination of controlling variables.
For example:
%s NO_COMMENTS
%%
<NO_COMMENTS>"//".* ; /* Ignore comments in `NO_COMMENTS mode. */
The %s declaration means that all unmarked rules also apply to the N_COMMENTS state; you will commonly see %x ("exclusive") in examples, but that would force you to explicitly mark almost every rule.
Once you have modified you grammar in this way, you can select the appropriate set of rules at run-time by setting the lexer's state with BEGIN(INITIAL) or BEGIN(NO_COMMENTS). (The BEGIN macro is only defined in the flex generated file, so you will want to export a function which performs one of these two actions.)
Using cpp as a utility.
There is no preprocessor feature in flex. It's possible that you could use a C preprocessor to preprocess your flex file before passing it to flex, but you will have to be very careful with your input file:
The C preprocessor expects its input to be a sequence of valid C preprocessor tokens. Many common flex patterns will not match this assumption, because of the very different quoting rules. (For a simple example, a common pattern to recognise C comments includes the character class [^/*] which will be interpreted by the C preprocessor as containing the start of a C comment.)
The flex input file is likely to have a number of lines which are valid #include directives. There is no way to avoid these directives from being expanded (other than removing them from the file). Once expanded and incorporated into the source, the header files no longer have include guards, so you will have to tell flex not to insert any #include files from its own templates. I believe that is possible, but it will be a bit fragile.
The C preprocessor may expand what looks to it like a macro invocation.
The C preprocessor might not preserve linear whitespace, altering the meaning of the flex scanner definition.
m4 and other preprocessors
It would be safer to use m4 as a preprocessor, but of course that means learning m4. ( You shouldn't need to install it because flex already depends on it. So if you have flex you also have m4.) And you will still need to be very careful with quoting sequences. M4 lets you customize these sequences, so it is more manageable than cpp. But don't copy the common idiom of defining [[ as a quote delimiter; it is very common inside regular expressions.
Also, m4 does not insert #line directives and any non-trivial use will change the number of input lines, making error messages harder to interpret. (To say nothing of the challenge of debugging.) You can probably avoid this issue in this very simple case but the issue will reappear.
You could also write your own simple preprocessor, but you will still need to address the above issues.
I'm programming a c++ application on an stm32f4 chip which has several IOs to control. One of my colleagues suggested to make preprocessor statements to all of these IOs to make the code more readable.
For example:
#define FAN_ON GPIO_SetBits(GPIOD, GPIO_Pin_0);
#define FAN_OFF GPIO_ResetBits(GPIOD, GPIO_Pin_0);
Is this ok this way, and if not, why?
I have not that much microcontroller experience yet, but I read that semicolons shouldn't be used in preprocessor statements and I'm also not sure if it is a good style to use functions in precompiler statements?
Thank you for your help!
It's fine in theory, but you're right in that the semicolons should be avoided.
It's best to wrap the code in a dummy loop:
#define FAN_ON do { GPIO_SetBits(GPIOD, GPIO_Pin_0); } while(false)
This makes the macro behave like a single statement.
To answer your first question, even though this is common, it is pretty bad style to use preprocessor statements to define functions, except when you really need the preprocessor. You really need the preprocessor when you need things like __LINE__, or preprocessor substitutions tricks like put the function name in a char * variable. You could define a function void fan_on(void) and void fan_off(void) instead of those macros, you can even declare them static inline if you want to declare them in a header like you would do with macros. Functions can be better used by the debugger than macros, and better debugged.
As you said if you still want to use macros you should not use semicolons, and use a do while(0) structure, if you don't using the macro if a single-line if block will only execute the first line of the macro.
I am trying to build a program that parses and lists the content of header files. So far, so good, I found it easy parsing and listing headers I wrote, but when I started parsing cross platform API headers things got messy.
My current approach is rather simplistic, here is a pseudocode example of parsing the following function:
void foo(int a);
void is a type, so we are dealing with instancing a type
foo is the name of that type
foo is followed by brackets, meaning it is a function of type void named foo
int is a type...
a is the name of that type instance
foo is a function of type void that takes one parameter of type int named a
However, when I got into bigger and more complex headers I stumbled upon somewhat irregular prototypes, involving macros and god knows what. An example:
GLAPI void APIENTRY glEvalCoord1d( GLdouble u );
GLAPI and APIENTRY are platform dependent macros. Which kind of spoils my simple parsing scheme, since it expects the name of an object to follow its type. Those two macros happen to translate to either __stdcall, __declspec(dllimport) or extern but in theory they could mean anything, with their meaning being unclear until compile time.
How to write my parser so it can deal with such scenarios and not get confused? The macros themselves are defined at an earlier stage, so the parser can be aware GLAPI and APIENTRY are macros so they can simply be ignored, is this the way to go? Naturally this is just one of the many variations of irregularities the parser may stumble upon parsing through different headers, so any general techniques of how to deal with the parsing of any "legal" header content are welcome.
There isn't any real alternative to expanding the macros before you parse, at least if you want process header files with the same complexity as Microsoft's, or any other header files associated with a compiler system that has been around for 10 years or more.
The unpreprocessed source code is NOT C; it is simply unpreprocessed source code. The macros (and prepreprocessor conditionals which you surprising didn't mention) can edit the apparant source in not arbitrary but spectacularly complex fashion. And you can't often know what the macros used, or conditionals expanded, unless you process the #includes as well.
You can get GCC to do preprocessor expansion for you, and then parse it. That would be far
the easiest way to approach this.
That still leaves the problem of parsing real C code, with all the complexities of declarators, and ambiguities in fragments suchas T X; where the meaning of the statement depends on the declaration of T. To parse the headers accurately, you need a full C parser.
Our C Front End can do full preprocessing, or you can invoke it a mode in which some macros are expanded, and some are not. By tuning this set, you often parse such headers without exapanding every macro. Preprocessor conditionals are much more difficult, because they can occur at inconvenient (unstructured) places.
If all you want is the name and signature of functions, then a simple search and replace for macros should be sufficient.
However, you need to check if a macro contains keywords (like the return value). This may be possible by stripping macro definitions of every but keywords as they are defined, but tracking them and using a simple preprocessor will be necessary.
The platform dependent keywords, such as __declspec and __attribute__ have very limited syntax and there are only a few of them, so specifically removing those is possible.
You may want to take a look at how doxygen handles this, because it does almost exactly what you want and does handle macros. It allows a list of macros to be expanded as defined, and ones that should be expanded to a custom value. You could adapt that to expand __declspec(x) to nothing, and expand all others to their defined value by default.
This certainly isn't foolproof, but a search and replace is about the simplest functional solution you'll get. You need to follow the standard C++ preprocessor rules, which aren't terribly complex, with additional macros (const, declspec, etc) to strip extra attributes, and parse the final results.
After browsing some old code, I noticed that some classes are defined in this manner:
MIDL_INTERFACE("XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX")
Classname: public IUnknown {
/* classmembers ... */
};
However, the macro MIDL_INTERFACE is defined as:
#define MIDL_INTERFACE(x) struct
in C:/MinGW/include/rpcndr.h (somewhere around line 17). The macro itself is rather obviously entirely pointless, so what's the true purpose of this macro?
In the Windows SDK version that macro expands to
struct __declspec(uuid(x)) __declspec(novtable)
The first one allows use of the __uuidof keyword which is a nice way to get the guid of an interface from the typename. The second one suppresses the generation of the v-table, one that is never used for an interface. A space optimization.
This is because MinGW does not support COM (or rather, supports it extremely poorly). MIDL_INTERFACE is used when defining a COM component, and it is generated by the IDL compiler, which generates COM type libraries and class definitions for you.
On MSVC, this macro typically expands to more complicated initialization and annotations to expose the given C++ class to COM.
If I had to guess, it's for one of two use cases:
It's possible that there's an external tool that parses the files looking for declarations like these. The idea is that by having the macro evaluate to something harmless, the code itself compiles just fine, but the external tool can still look at the source code and extract information out of it.
Another option might be that the code uses something like the X Macro Trick to selectively redefine what this preprocessor directive means so that some other piece of the code can interpret the data in some other way. Depending on where the #define is this may or may not be possible, but it seems reasonable that this might be the use case. This is essentially a special-case of the first option.