Flex C++ - #ifdef inside flex block - c++

I want to define constant in preprocessor which launches matching some patterns only when it's defined. Is it possible to do this, or there is the other way how to deal with this problem?
i.e. simplified version of removing one-line comments in C:
%{
#define COMMENT
%}
%%
#ifdef COMMENT
[\/][\/].*$ ;
#endif
[1-9][0-9]* printf("It's a number, and it works with and without defining COMMENT");
%%

There is no great solution to this (very reasonable) request, but there are some possibilities.
(F)lex start conditions
Flex start conditions make it reasonably simple to define a few optional patterns, but they don't compose well. This solution will work best if you have only a single controlling variable, since you will have ti define a separate start condition for every possible combination of controlling variables.
For example:
%s NO_COMMENTS
%%
<NO_COMMENTS>"//".* ; /* Ignore comments in `NO_COMMENTS mode. */
The %s declaration means that all unmarked rules also apply to the N_COMMENTS state; you will commonly see %x ("exclusive") in examples, but that would force you to explicitly mark almost every rule.
Once you have modified you grammar in this way, you can select the appropriate set of rules at run-time by setting the lexer's state with BEGIN(INITIAL) or BEGIN(NO_COMMENTS). (The BEGIN macro is only defined in the flex generated file, so you will want to export a function which performs one of these two actions.)
Using cpp as a utility.
There is no preprocessor feature in flex. It's possible that you could use a C preprocessor to preprocess your flex file before passing it to flex, but you will have to be very careful with your input file:
The C preprocessor expects its input to be a sequence of valid C preprocessor tokens. Many common flex patterns will not match this assumption, because of the very different quoting rules. (For a simple example, a common pattern to recognise C comments includes the character class [^/*] which will be interpreted by the C preprocessor as containing the start of a C comment.)
The flex input file is likely to have a number of lines which are valid #include directives. There is no way to avoid these directives from being expanded (other than removing them from the file). Once expanded and incorporated into the source, the header files no longer have include guards, so you will have to tell flex not to insert any #include files from its own templates. I believe that is possible, but it will be a bit fragile.
The C preprocessor may expand what looks to it like a macro invocation.
The C preprocessor might not preserve linear whitespace, altering the meaning of the flex scanner definition.
m4 and other preprocessors
It would be safer to use m4 as a preprocessor, but of course that means learning m4. ( You shouldn't need to install it because flex already depends on it. So if you have flex you also have m4.) And you will still need to be very careful with quoting sequences. M4 lets you customize these sequences, so it is more manageable than cpp. But don't copy the common idiom of defining [[ as a quote delimiter; it is very common inside regular expressions.
Also, m4 does not insert #line directives and any non-trivial use will change the number of input lines, making error messages harder to interpret. (To say nothing of the challenge of debugging.) You can probably avoid this issue in this very simple case but the issue will reappear.
You could also write your own simple preprocessor, but you will still need to address the above issues.

Related

How to disable syntax check in c header file

I'm embedding Lua code in cplusplus; it's ok to write like
char const *lua_scripts = R"rawstring(
-- lua code
)rawstring";
But the Lua code inside the string doesn't have syntax highlight, so I split it into 3 files:
The first file is called head.txt
char const *lua_scripts = R"rawstring(
The second file is called body.lua
-- lua code
The third file is called tail.txt
)rawstring";
Then the original cpp file changed to
#include "head.txt"
#include "body.lua"
#include "tail.txt"
But when I compile, syntax error reported, because the compiler checked the file before inclusion. So how can I disable compiler checking syntax?
In C++, programs are parsed after preprocessing. But dividing the input into lexemes is done before preprocessing. The input to the preprocessor is a stream of tokens, not a stream of characters.
So a token cannot span two input files. And a string literal is a single token.
You also may not split preprocessor directives over two files, so #endif, #else, etc. must all be in the same file as the #if or #ifdef, and the last line in a file cannot end with a backslash line-splice.
You could easily write your own little merging program which builds a C++ file from the C++ and Lua source files. You could even write it in Lua, its not that complicated. Or you could do it with the M4 macro processor, which is most likely already installed in your compilation environment.
There are nine phases of translation that occur when C++ code is compiled. Phase 3 is when string literals are identified. Phase 4 is the preprocessor. By the time the compiler gets to #include your files, all the string literals in your original source file have been found and marked as such. There will not be another pass of your source file looking for more literals after the preprocessor is done.
When the preprocessor brings in a file, that file goes through the first four phases of translation before being inserted into your original source file. This is slightly different than the common, simplified perception of a header file being directly copied into a source file. Rather than a character-by-character copy, the header is copied token-by-token, where "token" means "preprocessing token", which includes such things as identifiers, operators, and literals.
In practice, the simplified view is adequate until you try to have language elements cross file boundaries. In particular, neither comments nor string literals can start in one file and extend into another. (There are other examples, but it's a bit more contrived to bring them into play.) You tried to have a string literal begin in one file, extend into a second, and end in a third. This does not work.
When the preprocessor brings in head.txt, the first three phases analyze it as five preprocessor tokens followed by a non-terminated raw string literal. This is what gets copied into your source file. Note that the non-terminated literal remains a non-terminated literal; it does not become a literal looking for an end.
When body.lua is brought in, it is treated just like any other header file. The preprocessor is not concerned about extensions. The file is brought in and subject to the phases of translation just like any other #include. Phase 3 will identify, using C++ syntax rules, string literals that begin in body.lua, but no part of body.lua will become part of a string literal that begins outside body.lua. Phase 3, including the identification of string literals, happens on this file in isolation.
Your approach has failed.
So how can I disable compiler checking syntax?
You cannot disable compiler syntax checking. That's like asking how can you have a person read a book without picking out letters and words. You've asked the compiler to process your code, and the first step of that is making sure the code is understandable, i.e. has valid syntax. It's questions like this that remind us that XY problems are as prevalent as ever.
Fortunately, though, you did mention your real problem: "doesn't have syntax highlight". Unfortunately, you did not provide enough information about your real problem, such as what program is providing the syntax highlighting. I subjected the following to two different syntax highlighters; one highlighted the Lua code as Lua code, and the other did not.
R"rawstring(
-- lua code
)rawstring"
If you are willing to ignore the highlighting on the first and last lines, and if your editor successfully applies the desired syntax highlighting, you could make this your body.lua file. Then the following C++ code should work.
char const *lua_scripts =
#include "body.lua"
;
Statements are not identified until phase seven – well after the preprocessor – so you can split statements across files.
You could use the unix xxd utility in a pre-build step to preprocess your body.lua file as follows:
xxd -i body.lua body.xxd
Then in your c++ code:
#include "body.xxd"
const std::string lua_scripts(reinterpret_cast<char *>(body), body_len);

Parsing irregular c++ prototypes

I am trying to build a program that parses and lists the content of header files. So far, so good, I found it easy parsing and listing headers I wrote, but when I started parsing cross platform API headers things got messy.
My current approach is rather simplistic, here is a pseudocode example of parsing the following function:
void foo(int a);
void is a type, so we are dealing with instancing a type
foo is the name of that type
foo is followed by brackets, meaning it is a function of type void named foo
int is a type...
a is the name of that type instance
foo is a function of type void that takes one parameter of type int named a
However, when I got into bigger and more complex headers I stumbled upon somewhat irregular prototypes, involving macros and god knows what. An example:
GLAPI void APIENTRY glEvalCoord1d( GLdouble u );
GLAPI and APIENTRY are platform dependent macros. Which kind of spoils my simple parsing scheme, since it expects the name of an object to follow its type. Those two macros happen to translate to either __stdcall, __declspec(dllimport) or extern but in theory they could mean anything, with their meaning being unclear until compile time.
How to write my parser so it can deal with such scenarios and not get confused? The macros themselves are defined at an earlier stage, so the parser can be aware GLAPI and APIENTRY are macros so they can simply be ignored, is this the way to go? Naturally this is just one of the many variations of irregularities the parser may stumble upon parsing through different headers, so any general techniques of how to deal with the parsing of any "legal" header content are welcome.
There isn't any real alternative to expanding the macros before you parse, at least if you want process header files with the same complexity as Microsoft's, or any other header files associated with a compiler system that has been around for 10 years or more.
The unpreprocessed source code is NOT C; it is simply unpreprocessed source code. The macros (and prepreprocessor conditionals which you surprising didn't mention) can edit the apparant source in not arbitrary but spectacularly complex fashion. And you can't often know what the macros used, or conditionals expanded, unless you process the #includes as well.
You can get GCC to do preprocessor expansion for you, and then parse it. That would be far
the easiest way to approach this.
That still leaves the problem of parsing real C code, with all the complexities of declarators, and ambiguities in fragments suchas T X; where the meaning of the statement depends on the declaration of T. To parse the headers accurately, you need a full C parser.
Our C Front End can do full preprocessing, or you can invoke it a mode in which some macros are expanded, and some are not. By tuning this set, you often parse such headers without exapanding every macro. Preprocessor conditionals are much more difficult, because they can occur at inconvenient (unstructured) places.
If all you want is the name and signature of functions, then a simple search and replace for macros should be sufficient.
However, you need to check if a macro contains keywords (like the return value). This may be possible by stripping macro definitions of every but keywords as they are defined, but tracking them and using a simple preprocessor will be necessary.
The platform dependent keywords, such as __declspec and __attribute__ have very limited syntax and there are only a few of them, so specifically removing those is possible.
You may want to take a look at how doxygen handles this, because it does almost exactly what you want and does handle macros. It allows a list of macros to be expanded as defined, and ones that should be expanded to a custom value. You could adapt that to expand __declspec(x) to nothing, and expand all others to their defined value by default.
This certainly isn't foolproof, but a search and replace is about the simplest functional solution you'll get. You need to follow the standard C++ preprocessor rules, which aren't terribly complex, with additional macros (const, declspec, etc) to strip extra attributes, and parse the final results.

Preprocessor and whitespaces rules

I am interested in defining my own language inside a C++ block (lets say for example main) and for that purpose I need to use the preprocessor and its directives my problem relies to the below rule:
#define INSERT create() ...
Is called a function-like definition and preprocessor does not allow any whitespaces in what we define ,
So when I use a function of my own language I got to parse right handy the below statement:
INSERT INTO variable_name VALUES(arg_list)
to a different two function calls lets say
insertINTO(variable_name) and valuePARSE(arg_list)
but since the preprocessor directive rules do not allow me to have whitespaces in my definition how I can reach the variable_name and then make the call to the first function call I want to achieve?
Any clues would be helpful.
PS: I tried using g++ -E file.cpp to see how preprocessor works and to adjust the syntax to be valid c++ rules.
The preprocessor included with most C++ compilers is probably way too weak for this kind of task. It was never designed for this kind of abuse. The boost preprocessor library could help you on the way, but I still think you're heading down a one-way street here.
If you really want to define your language this way, I suggest you either write your own preprocessor, or use one that is more powerful than the default one. Here is one chap who tried using Python as a C++ preprocessor.
1) define INSERT create() is not a function-like macro it's object-like, something like define INSERT(a, b, c) create(a, b, c) would be;
2) if you want to expand INSERT INTO variable_name VALUES(arg_list) into insertINTO(variable_name); valuePARSE(arg_list); you can do something like:
#define INSERT insertINTO(
#define INTO
#define VALUES(...) ); valueParse(__VA_ARGS__);
3) as you can see macros get ugly pretty easy and even the slightest error in your syntax will have you spend a lot of time tracking it down
4) since it's tagged C++ take a look at Boost.Proto or Boost.Preprocessor.

Preprocessor directives

When we see #include <iostream>, it is said to be a preprocessor directive.
#include ---> directive
And, I think:
<iostream> ---> preprocessor
But, what is meant by "preprocessor" and "directive"?
It may help to think of the relationship between a "directive" and being "given directions" (i.e. orders). "preprocessor directives" are directions to the preprocessor about changes it should make to the code before the later stages of compilation kick in.
But, what's the preprocessor? Well, its name reflects that it processes the source code before the "main" stages of compilation. It's simply there to process the textual source code, modifying it in various ways. The preprocessor doesn't even understand the tokens it operates on - it has no notion of types or variables, classes or functions - it's all just quoted- and/or parentheses- grouped, comma- and/or whitespace separated text to be manhandled. This extra process gives more flexibility in selecting, combining and even generating parts of the program.
EDIT addressing #SWEngineer's comment: Many people find it helpful to think of the preprocessor as a separate program that modifies the C++ program, then gives its output to the "real" C++ compiler (this is pretty much the way it used to be). When the preprocessor sees #include <iostream> it thinks "ahhha - this is something I understand, I'm going to take care of this and not just pass it through blindly to the C++ compiler". So, it searches a number of directories (some standard ones like /usr/include and wherever the compiler installed its own headers, as well as others specified using -I on the command line) looking for a file called "iostream". When it finds it, it then replaces the line in the input program saying "#include " with the complete contents of the file called "iostream", adding the result to the output. BUT, it then moves to the first line it read from the "iostream" file, looking for more directives that it understands.
So, the preprocessor is very simple. It can understand #include, #define, #if/#elif/#endif, #ifdef and $ifndef, #warning and #error, but not much else. It doesn't have a clue what an "int" is, a template, a class, or any of that "real" C++ stuff. It's more like some automated editor that cuts and pastes parts of files and code around, preparing the program that the C++ compiler proper will eventually see and process. The preprocessor is still very useful, because it knows how to find parts of the program in all those different directories (the next stage in compilation doesn't need to know anything about that), and it can remove code that might work on some other computer system but wouldn't be valid on the one in use. It can also allow the program to use short, concise macro statements that generate a lot of real C++ code, making the program more manageable.
#include is the preprocessor directive, <iostream> is just an argument supplied in addition to this directive, which in this case happens to be a file name.
Some preprocessor directives take arguments, some don't, e.g.
#define FOO 1
#ifdef _NDEBUG
....
#else
....
#endif
#warning Untested code !
The common feature is that they all start with #.
In Olden Times the preprocessor was a separate tool which pre-processed source code before passing it to the compiler front-end, performing macro substitutions and including header files, etc. These days the pre-processor is usually an integral part of the compiler, but it essentially just does the same job.
Preprocessor directives, such as #define and #ifdef, are typically used to make source programs easy to change and easy to compile in different execution environments. Directives in the source file tell the preprocessor to perform specific actions. For example, the preprocessor can replace tokens in the text, insert the contents of other files into the source file...
#include is a preprocessor directive meaning that it is use by the preprocessor part of the compiler. This happens 'before' the compilation process. The #include needs to specify 'what' to include, this is supplied by the argument iostream. This tells the preprocessor to include the file iostream.h.
More information:
Preprocessor Directives on MSDN
Preprocessor directives on cplusplus.com

Multiple preprocessor directives on one line in C++

A hypothetical question: Is it possible to have a C++ program, which includes preprocessor directives, entirely on one line?
Such a line would look like this:
#define foo #ifdef foo #define bar #endif
What are the semantics of such a line?
Further, are there any combinations of directives which are impossible to construct on one line?
If this is compiler-specific then both VC++ and GCC answers are welcome.
A preprocessing directive must be terminated by a newline, so this is actually a single preprocessing directive that defines an object-like macro, named foo, that expands to the following token sequence:
# ifdef foo # define bar # endif
Any later use of the name foo in the source (until it is #undefed) will expand to this, but after the macro is expanded, the resulting tokens are not evaluated as a preprocessing directive.
This is not compiler-specific; this behavior is defined by the C and C++ standards.
Preprocessor directives are somewhat different than language statements, which are terminated by ; and use whitespace to delimit tokens. In the case of the preprocessor, the directive is terminated by a newline so it's impossible to do what you're attempting using the C++ language itself.
One way you could kind of simulate this is to put your desired lines into a separate header file and then #include it where you want. The separate header still has to have each directive on one line, but the point where you include it is just a single line, effectively doing what you asked.
Another way to accomplish something like that is to have a pre-C++ file that you use an external process to process into a C++ source file prior to compiling with your C++ compiler. This is probably rather more trouble than it's worth.