I'm embedding Lua code in cplusplus; it's ok to write like
char const *lua_scripts = R"rawstring(
-- lua code
)rawstring";
But the Lua code inside the string doesn't have syntax highlight, so I split it into 3 files:
The first file is called head.txt
char const *lua_scripts = R"rawstring(
The second file is called body.lua
-- lua code
The third file is called tail.txt
)rawstring";
Then the original cpp file changed to
#include "head.txt"
#include "body.lua"
#include "tail.txt"
But when I compile, syntax error reported, because the compiler checked the file before inclusion. So how can I disable compiler checking syntax?
In C++, programs are parsed after preprocessing. But dividing the input into lexemes is done before preprocessing. The input to the preprocessor is a stream of tokens, not a stream of characters.
So a token cannot span two input files. And a string literal is a single token.
You also may not split preprocessor directives over two files, so #endif, #else, etc. must all be in the same file as the #if or #ifdef, and the last line in a file cannot end with a backslash line-splice.
You could easily write your own little merging program which builds a C++ file from the C++ and Lua source files. You could even write it in Lua, its not that complicated. Or you could do it with the M4 macro processor, which is most likely already installed in your compilation environment.
There are nine phases of translation that occur when C++ code is compiled. Phase 3 is when string literals are identified. Phase 4 is the preprocessor. By the time the compiler gets to #include your files, all the string literals in your original source file have been found and marked as such. There will not be another pass of your source file looking for more literals after the preprocessor is done.
When the preprocessor brings in a file, that file goes through the first four phases of translation before being inserted into your original source file. This is slightly different than the common, simplified perception of a header file being directly copied into a source file. Rather than a character-by-character copy, the header is copied token-by-token, where "token" means "preprocessing token", which includes such things as identifiers, operators, and literals.
In practice, the simplified view is adequate until you try to have language elements cross file boundaries. In particular, neither comments nor string literals can start in one file and extend into another. (There are other examples, but it's a bit more contrived to bring them into play.) You tried to have a string literal begin in one file, extend into a second, and end in a third. This does not work.
When the preprocessor brings in head.txt, the first three phases analyze it as five preprocessor tokens followed by a non-terminated raw string literal. This is what gets copied into your source file. Note that the non-terminated literal remains a non-terminated literal; it does not become a literal looking for an end.
When body.lua is brought in, it is treated just like any other header file. The preprocessor is not concerned about extensions. The file is brought in and subject to the phases of translation just like any other #include. Phase 3 will identify, using C++ syntax rules, string literals that begin in body.lua, but no part of body.lua will become part of a string literal that begins outside body.lua. Phase 3, including the identification of string literals, happens on this file in isolation.
Your approach has failed.
So how can I disable compiler checking syntax?
You cannot disable compiler syntax checking. That's like asking how can you have a person read a book without picking out letters and words. You've asked the compiler to process your code, and the first step of that is making sure the code is understandable, i.e. has valid syntax. It's questions like this that remind us that XY problems are as prevalent as ever.
Fortunately, though, you did mention your real problem: "doesn't have syntax highlight". Unfortunately, you did not provide enough information about your real problem, such as what program is providing the syntax highlighting. I subjected the following to two different syntax highlighters; one highlighted the Lua code as Lua code, and the other did not.
R"rawstring(
-- lua code
)rawstring"
If you are willing to ignore the highlighting on the first and last lines, and if your editor successfully applies the desired syntax highlighting, you could make this your body.lua file. Then the following C++ code should work.
char const *lua_scripts =
#include "body.lua"
;
Statements are not identified until phase seven – well after the preprocessor – so you can split statements across files.
You could use the unix xxd utility in a pre-build step to preprocess your body.lua file as follows:
xxd -i body.lua body.xxd
Then in your c++ code:
#include "body.xxd"
const std::string lua_scripts(reinterpret_cast<char *>(body), body_len);
Related
I am trying to figure out how parsers handle preprocessor and conditional compilation. Using c++ as an example, are preprocessor directives included in c++ grammar rules, or is it a separate language and preprocessing happens before parsing. In both cases, how can a parser figure out errors in all possible branches and retrieve information about original code layout before preprocessing (such as number of line where the error occured)?
Taken from the C Preprocessor docs:
The C preprocessor informs the C compiler of the location in your source code where each token came from.
So in the case of GCC, the parser knows where the errors occur, because the preprocessor tells it. I am unsure whether this quotation refers to preprocessing tokens, or all C++ tokens.
This page has a few more details on how the magic happens.
The cpp_token structure contains line and col members. The lexer fills these in with the line and column of the first character of the token. Consequently, but maybe unexpectedly, a token from the replacement list of a macro expansion carries the location of the token within the #define directive, because cpplib expands a macro by returning pointers to the tokens in its replacement list.
[...] This variable therefore uniquely enumerates each line in the translation unit. With some simple infrastructure, it is straight forward to map from this to the original source file and line number pair
Here is a copy of the C++14(?) draft standard. The preprocessing grammar is in Appendix A.14. I'm not sure it matters whether you want to call it a separate language or not. Per [lex.phases] (section 2.2), C++ compilers behave as if preprocessing happens before the main translation/parsing happens.
I want to define constant in preprocessor which launches matching some patterns only when it's defined. Is it possible to do this, or there is the other way how to deal with this problem?
i.e. simplified version of removing one-line comments in C:
%{
#define COMMENT
%}
%%
#ifdef COMMENT
[\/][\/].*$ ;
#endif
[1-9][0-9]* printf("It's a number, and it works with and without defining COMMENT");
%%
There is no great solution to this (very reasonable) request, but there are some possibilities.
(F)lex start conditions
Flex start conditions make it reasonably simple to define a few optional patterns, but they don't compose well. This solution will work best if you have only a single controlling variable, since you will have ti define a separate start condition for every possible combination of controlling variables.
For example:
%s NO_COMMENTS
%%
<NO_COMMENTS>"//".* ; /* Ignore comments in `NO_COMMENTS mode. */
The %s declaration means that all unmarked rules also apply to the N_COMMENTS state; you will commonly see %x ("exclusive") in examples, but that would force you to explicitly mark almost every rule.
Once you have modified you grammar in this way, you can select the appropriate set of rules at run-time by setting the lexer's state with BEGIN(INITIAL) or BEGIN(NO_COMMENTS). (The BEGIN macro is only defined in the flex generated file, so you will want to export a function which performs one of these two actions.)
Using cpp as a utility.
There is no preprocessor feature in flex. It's possible that you could use a C preprocessor to preprocess your flex file before passing it to flex, but you will have to be very careful with your input file:
The C preprocessor expects its input to be a sequence of valid C preprocessor tokens. Many common flex patterns will not match this assumption, because of the very different quoting rules. (For a simple example, a common pattern to recognise C comments includes the character class [^/*] which will be interpreted by the C preprocessor as containing the start of a C comment.)
The flex input file is likely to have a number of lines which are valid #include directives. There is no way to avoid these directives from being expanded (other than removing them from the file). Once expanded and incorporated into the source, the header files no longer have include guards, so you will have to tell flex not to insert any #include files from its own templates. I believe that is possible, but it will be a bit fragile.
The C preprocessor may expand what looks to it like a macro invocation.
The C preprocessor might not preserve linear whitespace, altering the meaning of the flex scanner definition.
m4 and other preprocessors
It would be safer to use m4 as a preprocessor, but of course that means learning m4. ( You shouldn't need to install it because flex already depends on it. So if you have flex you also have m4.) And you will still need to be very careful with quoting sequences. M4 lets you customize these sequences, so it is more manageable than cpp. But don't copy the common idiom of defining [[ as a quote delimiter; it is very common inside regular expressions.
Also, m4 does not insert #line directives and any non-trivial use will change the number of input lines, making error messages harder to interpret. (To say nothing of the challenge of debugging.) You can probably avoid this issue in this very simple case but the issue will reappear.
You could also write your own simple preprocessor, but you will still need to address the above issues.
Accelerated C++: Practical Programming by Example book says the following..
... system header files need not be implemented as files. Even though the #include
directive is used to access both our own header files and system headers, there
is no requirement that they be implemented in the same way
What exactly does this mean? If not as a file how else can a system header file be implemented?
Imagine you write your own compiler and C++ standard library. You could make it so that #include <vector> does not open any file, but instead simply loads some state into the compiler which makes it understand std::vector. You could then implement your vector class in some language other than C++, so long as your compiler understands enough to make it work "as if" you had written an actual C++ source file called vector.
The compiler could have hardcoded that when it sees:
#include <iostream>
then it makes available all definitions of things that are specified as being declared by this directive, etc.
Or it could store the definitions in a database, or some other encoded file, or the cloud, or whatever. The point is that the standard does not restrict the compiler in any way, so long as the end goal is achieved that the specified things get declared.
The way in which headers are included into your "source file stream" is left mostly up to the implementation.
C++11 (but this has been the case for a long time, both in C++ and C) 16.2 Source file inclusion states:
A #include directive shall identify a header or source file that can be processed by the implementation.
A preprocessing directive of the form # include < h-char-sequence> new-line searches a sequence of implementation-defined places for a header identified uniquely by the specified sequence between the < and > delimiters, and causes the replacement of that directive by the entire contents of the header. How the places are specified or the header identified is implementation-defined.
(and then further description of the " and naked variants of #include).
So the header may be in a file.
It may also be injected by the compiler from hard-coded values.
Or read from a server located on one of the planets orbiting Betelgeuse (though, without FTL transmissions, such a compiler wouldn't last long in the marketplace).
The possibilities are many and varied, most of them bordering on lunacy but none of them actually forbidden by the standard itself.
I came across the following code in a .cpp file. I do not understand the construct or syntax which involves the header files. I do recognize that these particular header files relate to Android NDK. But, I think the question is a general question about C++ syntax.
These appear to be preprocessor commands in some way because they begin with "#". But, they are not the typical #include, #pragma, #ifndef, #define, etc. commands. The source file has more 1000+ such occurrences referencing hundreds of different .h, .c, .cpp files.
typedef int __time_t;
typedef int __timer_t;
# 116 "/home/usr/download/android-ndk-r8b/platforms/android-3/arch-arm/usr/include/machine/_types.h"
# 41 "/home/usr/download/android-ndk-r8b/platforms/android-3/arch-arm/usr/include/sys/_types.h" 2
# 33 "/home/usr/download/android-ndk-r8b/platforms/android-3/arch-arm/usr/include/stdint.h" 2
# 48 "/home/usr/download/android-ndk-r8b/platforms/android-3/arch-arm/usr/include/stdint.h"
typedef __int8_t int8_t;
typedef __uint8_t uint8_t;
The compiler (GCC) does not appear to be throwing any error related to these lines. But, I would like to understand their purpose and function. Can anybody explain these?
This is output from the GCC preprocessor. Those lines are known as linemarkers. They have the syntax:
# linenum filename flags
They are interpreted as saying that the following line has come from the line linenum from filename. They basically just help you and the compiler see where lines were included from. The flags provide some more information:
1 - This indicates the start of a new file.
2 - This indicates returning to a file (after having included another file).
3 - This indicates that the following text comes from a system header file, so certain warnings should be suppressed.
4 - This indicates that the following text should be treated as being wrapped in an implicit extern "C" block.
You can see this output from preprocessing your own programs if you give the -E flag to g++.
You'll typically see lines like that in the output of the preprocessor (i.e., you normally shouldn't be seeing them at all).
They're similar to the standard #line directive, which has the form:
#line 42
or
#line 42 "foo.c"
which the compiler uses to control the contents of error messages.
Without the word line, this:
# 42 "foo.c"
is technically a non-directive (which, just to add to the fun, is a kind of directive). It's essentially a comment as far as the C standard is concerned. At a guess, gcc's preprocessor probably emits these rather than #line directives because #line directives are intended as input to the preprocessor.
gcc's preprocessor refers to these as "linemarkers"; they're discussed in the cpp manual. They're treated like #line directives, except that they can take an additional flag argument.
The preprocessors tend to introduce these directives and use them to indicate the line and filename. The C++ doesn't define the meaning but it reserves the use of
# <non-directive>
where is something which isn't one of the normal directives. It seems compiler writes have agreed to use the line number and filename in these as the result of preprocessing the file. This use is similar to basically all compilers supporting the -E option to indicate that the file(s) should just be processed.
When we see #include <iostream>, it is said to be a preprocessor directive.
#include ---> directive
And, I think:
<iostream> ---> preprocessor
But, what is meant by "preprocessor" and "directive"?
It may help to think of the relationship between a "directive" and being "given directions" (i.e. orders). "preprocessor directives" are directions to the preprocessor about changes it should make to the code before the later stages of compilation kick in.
But, what's the preprocessor? Well, its name reflects that it processes the source code before the "main" stages of compilation. It's simply there to process the textual source code, modifying it in various ways. The preprocessor doesn't even understand the tokens it operates on - it has no notion of types or variables, classes or functions - it's all just quoted- and/or parentheses- grouped, comma- and/or whitespace separated text to be manhandled. This extra process gives more flexibility in selecting, combining and even generating parts of the program.
EDIT addressing #SWEngineer's comment: Many people find it helpful to think of the preprocessor as a separate program that modifies the C++ program, then gives its output to the "real" C++ compiler (this is pretty much the way it used to be). When the preprocessor sees #include <iostream> it thinks "ahhha - this is something I understand, I'm going to take care of this and not just pass it through blindly to the C++ compiler". So, it searches a number of directories (some standard ones like /usr/include and wherever the compiler installed its own headers, as well as others specified using -I on the command line) looking for a file called "iostream". When it finds it, it then replaces the line in the input program saying "#include " with the complete contents of the file called "iostream", adding the result to the output. BUT, it then moves to the first line it read from the "iostream" file, looking for more directives that it understands.
So, the preprocessor is very simple. It can understand #include, #define, #if/#elif/#endif, #ifdef and $ifndef, #warning and #error, but not much else. It doesn't have a clue what an "int" is, a template, a class, or any of that "real" C++ stuff. It's more like some automated editor that cuts and pastes parts of files and code around, preparing the program that the C++ compiler proper will eventually see and process. The preprocessor is still very useful, because it knows how to find parts of the program in all those different directories (the next stage in compilation doesn't need to know anything about that), and it can remove code that might work on some other computer system but wouldn't be valid on the one in use. It can also allow the program to use short, concise macro statements that generate a lot of real C++ code, making the program more manageable.
#include is the preprocessor directive, <iostream> is just an argument supplied in addition to this directive, which in this case happens to be a file name.
Some preprocessor directives take arguments, some don't, e.g.
#define FOO 1
#ifdef _NDEBUG
....
#else
....
#endif
#warning Untested code !
The common feature is that they all start with #.
In Olden Times the preprocessor was a separate tool which pre-processed source code before passing it to the compiler front-end, performing macro substitutions and including header files, etc. These days the pre-processor is usually an integral part of the compiler, but it essentially just does the same job.
Preprocessor directives, such as #define and #ifdef, are typically used to make source programs easy to change and easy to compile in different execution environments. Directives in the source file tell the preprocessor to perform specific actions. For example, the preprocessor can replace tokens in the text, insert the contents of other files into the source file...
#include is a preprocessor directive meaning that it is use by the preprocessor part of the compiler. This happens 'before' the compilation process. The #include needs to specify 'what' to include, this is supplied by the argument iostream. This tells the preprocessor to include the file iostream.h.
More information:
Preprocessor Directives on MSDN
Preprocessor directives on cplusplus.com