I have an ocaml Abstract Syntax Tree file, called astDual.ml, and associated parser.mly and lexer.mll files.
In parser.mly, typically we write:
%start <Ast.expr> prog
after declaring tokens and associativity.
But since my file is not ast.ml, but instead astDual.ml, should I change this to
%start <AstDual.expr> prog
?
Yes, files in OCaml correspond to modules that have the same name but with the first letter capitalized. See the corresponding section in the manual for more information.
Related
I'm embedding Lua code in cplusplus; it's ok to write like
char const *lua_scripts = R"rawstring(
-- lua code
)rawstring";
But the Lua code inside the string doesn't have syntax highlight, so I split it into 3 files:
The first file is called head.txt
char const *lua_scripts = R"rawstring(
The second file is called body.lua
-- lua code
The third file is called tail.txt
)rawstring";
Then the original cpp file changed to
#include "head.txt"
#include "body.lua"
#include "tail.txt"
But when I compile, syntax error reported, because the compiler checked the file before inclusion. So how can I disable compiler checking syntax?
In C++, programs are parsed after preprocessing. But dividing the input into lexemes is done before preprocessing. The input to the preprocessor is a stream of tokens, not a stream of characters.
So a token cannot span two input files. And a string literal is a single token.
You also may not split preprocessor directives over two files, so #endif, #else, etc. must all be in the same file as the #if or #ifdef, and the last line in a file cannot end with a backslash line-splice.
You could easily write your own little merging program which builds a C++ file from the C++ and Lua source files. You could even write it in Lua, its not that complicated. Or you could do it with the M4 macro processor, which is most likely already installed in your compilation environment.
There are nine phases of translation that occur when C++ code is compiled. Phase 3 is when string literals are identified. Phase 4 is the preprocessor. By the time the compiler gets to #include your files, all the string literals in your original source file have been found and marked as such. There will not be another pass of your source file looking for more literals after the preprocessor is done.
When the preprocessor brings in a file, that file goes through the first four phases of translation before being inserted into your original source file. This is slightly different than the common, simplified perception of a header file being directly copied into a source file. Rather than a character-by-character copy, the header is copied token-by-token, where "token" means "preprocessing token", which includes such things as identifiers, operators, and literals.
In practice, the simplified view is adequate until you try to have language elements cross file boundaries. In particular, neither comments nor string literals can start in one file and extend into another. (There are other examples, but it's a bit more contrived to bring them into play.) You tried to have a string literal begin in one file, extend into a second, and end in a third. This does not work.
When the preprocessor brings in head.txt, the first three phases analyze it as five preprocessor tokens followed by a non-terminated raw string literal. This is what gets copied into your source file. Note that the non-terminated literal remains a non-terminated literal; it does not become a literal looking for an end.
When body.lua is brought in, it is treated just like any other header file. The preprocessor is not concerned about extensions. The file is brought in and subject to the phases of translation just like any other #include. Phase 3 will identify, using C++ syntax rules, string literals that begin in body.lua, but no part of body.lua will become part of a string literal that begins outside body.lua. Phase 3, including the identification of string literals, happens on this file in isolation.
Your approach has failed.
So how can I disable compiler checking syntax?
You cannot disable compiler syntax checking. That's like asking how can you have a person read a book without picking out letters and words. You've asked the compiler to process your code, and the first step of that is making sure the code is understandable, i.e. has valid syntax. It's questions like this that remind us that XY problems are as prevalent as ever.
Fortunately, though, you did mention your real problem: "doesn't have syntax highlight". Unfortunately, you did not provide enough information about your real problem, such as what program is providing the syntax highlighting. I subjected the following to two different syntax highlighters; one highlighted the Lua code as Lua code, and the other did not.
R"rawstring(
-- lua code
)rawstring"
If you are willing to ignore the highlighting on the first and last lines, and if your editor successfully applies the desired syntax highlighting, you could make this your body.lua file. Then the following C++ code should work.
char const *lua_scripts =
#include "body.lua"
;
Statements are not identified until phase seven – well after the preprocessor – so you can split statements across files.
You could use the unix xxd utility in a pre-build step to preprocess your body.lua file as follows:
xxd -i body.lua body.xxd
Then in your c++ code:
#include "body.xxd"
const std::string lua_scripts(reinterpret_cast<char *>(body), body_len);
I am making a compiler with flex, bisonc++ and gcc (in Ubuntu), which compile a simple esoteric programming language to c++ source code(I don't want to make assembly code).
I want to make optimization as well, so I need an AST to do it.
I also have a symbol table, but I don't have any idea how to construct an AST properly and if I have a correct AST for the grammar, how to make code optimization (so I don't want just print the AST).
My grammar (.y file) is full and correct (it recognize every syntax error).
It's clear that I have to write the action code of AST into the .y file, but I don't know what to read (I mentioned it also have syntax table), and where I should define my AST struct or class.
My files:
language.l (lex file where are the tokens)
language.y (bisonc++ source file(the grammar of my language))
lex.yy.cc (it's generated by flex)
Those are generated by bisonc++:
Parser.h (in this file, I added the symbol table, which is an std::map<std::string, var_data> where var_data is struct defined in semantics.h)
Parser.ih
Parserbase.h
parse.cc
I have a language.cc file which containts the main fuction, it reads the input file (comand line argument) and starts the analysis.
I also have a semantics.h header which contains structs for the symbol table.
The question is obvious I think, although I googled it, I could not find any solutions. I want to split my source code to keep it more maintainable. How can I reference a module in another file?
I think that you are looking for the use statement. You might, for example, have one source file containing the definition of a module, outline:
module abstract_types
implicit none
! declarations
contains
! procedure definitions
end module abstract_types
and then, in another source file, a program which uses the module, outline:
program hello_there
use abstract_types
implicit none
! declarations
! executable statements
end program hello_there
Note:
Any use statements precede the implicit statement.
The use statement refers to the module by name.
When it comes to compilation, make sure that you compile the module source file before the program source file; at compilation time (not at link time) the compiler will look for a module file (often called a mod file) to satisfy the reference to the module in the use statement. The mod file is a bit like a header file, but it's created by the compiler.
Later, when you link your program you'll need the object files for both module and program.
When you run bison, it creates a stack class for you in "stack.hh". The file name is fixed, but the contents are wrapped in a namespace of your choosing.
If you use bison to generate 2 separate grammars (ie 2 *.y files) and you use the C++ mode, the "stack.hh" files conflict and get overwritten.
A similar thing happens for the "location.hh" and "position.hh" classes that are autogenerated, but there is a work around in bison 2.7
%define api.location.type "foo::location"
that lets you reuse the foo grammar namespace in your bar grammar namespace.
But I can't find anyway of doing this exercise when dealing with the "stack.hh" file.
The easiest way to deal with this is just to put the Bison files in two separate directories. Then when you generate the code the files will not conflict, assuming each set of files gets generated in the same location as the corresponding Bison file.
No wonder i cant link to it from my flex file.
I have checked this and taken out the declaration "YYSTYPE yylval;" from the beginning of yyparse and it works as intended. Surely this is not the correct way to use bison and flex? Can somebody show me another way?
Thank you.
It is normal that yylval is declared and defined in the y.tab.c file output by bison. Its also declared (as extern) in the y.tab.h file, so if you include that in your lexer, you can access yylval as a global var. This is the normal way in which flex/bison works and there should be no need to edit the files to take out things -- it should 'just work'
This use of a global var causes problems if you want to have more than one parser in a program, or want to use multiple parsers in different threads (or otherwise simultaneously). Bison provides a way to avoid this with %define api.pure, which gets rid of yylval as a global -- instead the parser will call yylex with the address of a YYSTYPE (a pointer) and the lexer should put the token value there instead of in yylval. If you're using flex, you'll want #define YY_DECL int yylex(YYSTYPE *val) in the top of your flex file to change the declaration it uses for yylex.
Instead of using
#define YY_DECL int yylex(YYSTYPE *val)
you can also use
%option bison-bridge
But if you want to write a flex+bison parser in C++, then this method does not work.
For C++ parsers, check this example out.
I have checked this and taken out the
declaration "YYSTYPE yylval;"
I wonder if there is something wrong with your "taken out", but you could try
bison -d your-yacc-file.y
then bison will generate a header file for you with all those declarations.