How to set the API from lex yacc to Program - c++

I create an .exe FILE, which can parser an expression, which is generated by lex and yacc. But I do it just get the input from screen, and just return the parser result from screen. I saw some suggestions about using YY_BUFFER_STATE yy_scan_buffer(char *base, yy_size_t size), but I still could not find a good way to do it.
Is it possible that I put some headers (which is compiled by lex yacc) to my main program c++, and then I can use yylex() to call it, giving a string as input, and get the return value in the main program? Thanks for your help, I am confused about how to realize it. Thanks.

yy_scan_string is how you give flex a string as input. You call that first, and then call yylex and it will use that string as the input to get tokens from rather than stdin. When you get an EOF from yylex, it has scanned the entire string. You can then call yy_delete_buffer on the YY_BUFFER_STATE returned by yy_scan_string (to free up memory) and call yy_scan_string again if you want to scan a new string.
You can use yy_scan_buffer instead to save a bit of copying, but then you have to set up the buffer properly yourself (basically, it needs to end with two NUL bytes instead of just one).
Unfortunately, there's no standard header file from flex declaring these. So you need to either declare them yourself somewhere (copy the declarations from the flex documentation), or call them in the 3rd section of the .l file, which is copied verbatim to the end of the lex.y.c file.

Related

How to write custom input function for Flex in C++ mode?

I have a game engine and a shader parser. The engine has an API for reading from a virtual file system. I would like to be able to load shaders through this API. I was thinking about implementing my own std::ifstream but I don't like it, my api is very simple and I don't want to do a lot of unnecessary work. I just need to be able to read N bytes from the VFS. I used a C++ mod for more convenience, but in the end I can not find a solution to this problem, since there is very little official information about this. Everything is there for the C API, at least I can call the scan_string function, I did not find such a function in the yyFlexParser interface.
To be honest, I wanted to abandon the std::ifstream in the parser, and return only the C api . The only thing I used the Flex C++ mode for is to interact with the Bison C++ API and so that the parser can be used in a multi-threaded environment, but this can also be achieved with the C API.
I just couldn't compile the C parser with the C++ compiler.
I would be happy if there is a way to add such functionality through some kind of macro.
I wouldn't mind if there was a way to return the yy_scan_string function, I could read the whole file myself and provide just a string.
The simple solution, if you just want to provide a string input, is to make the string into a std::istringstream, which is a valid std::istream. The simplicity of this solution reduces the need for an equivalent to yy_scan_string.
On the other hand, if you have a data source you want to read from which is not derived from std::istream, you can easily create a lexical scanner which does whatever is necessary. Just subclass yyFlexLexer, add whatever private data members you will need and a constructor which initialises them, and override int LexerInput(char* buffer, size_t maxsize); to read at least one and no more than maxsize bytes into buffer, returning the number of characters read. (YY_INPUT also works in the C++ interface, but subclassing is more convenient precisely because it lets you maintain your own reader state.)
Notes:
If you decide to subclass and override LexerInput, you need to be aware that "interactive" mode is actually implemented in LexerInput. So if you want your lexer to have an interactive mode, you'll have to implement it in your override, too. In interactive mode, LexerInput always reads exactly one character (unless, of course, it's at the end of the file).
As you can see in the Flex code repository, a future version of Flex will use refactored versions of these functions, so you might need to be prepared to modify your code in the future, although Flex generally maintains backwards compatibility for a long time.

Using Bison-generated-compiler to compile source code

Well, so far i'm using GNU Bison with Lex & Yacc files to build a parser in C++, which is called by my program through the yyparse() function. Therefore the g++ compilation of my program produce an .a file that allow the user to insert some code to be parsed.
However I would like to use the generated-file to compile a whole project directory hierarchy (i.e a bunch of files). So, is Bison able to generate the result-compiler in a independent archive to allow me that? Maybe there is a simple way to parse multiple files? Or Should I manage this behavior through C++ algorithms by myself?
Thanks for the knowledge sharing!
Bisons/yacc generated parsers do not directly read input. The parsers use the tokens extracted from the input stream by yylex(), leaving it entirely up to yylex() to read the data or otherwise access the input.
By default, the yylex() generated by (f)lex reads input from the input stream pointed at by the global variable yyin. yylex() does not fopen a file or otherwise give yyin a value (except for initialising it to stdin).
To pass multiple files through your parser:
Set yyin appropriately:
yyin = fopen(filepath, "r");
Call yyparse().
Close yyin.
Repeat as necessary.

Accessing files made with mktemp for Linux through C++

I am trying to create a temporary file on a Linux system, but interfacing through C++ (so that the Linux commands are run through the C++ program).
To do so, I am using mktemp, which produces a temporary file.
I would need to later refer back to this file.
However, the filename is randomly generated and I am wondering if there is an easy way to access the filename.
The big honking comment in mktemp(3)'s manual page explicitly tells you to use mkstemp(3) instead of mktemp(3), and explains the good reason why it is so.
If you actually read the manual page for mkstemp(3) it clearly explains that the library function modifies the character buffer that's passed to it as a parameter to reflect the actual name of the created temporary file.
So to determine the name of the temporary file, simply refer to the character buffer you passed to this library function.

How can I print whatever I see in Yacc/Bison?

I have a complicated Yacc file with a bunch of rules, some of them complicated, for example:
start: program
program: extern_list class
class: T_CLASS T_ID T_LCB field_dec_list method_dec_list T_RCB
The exact rules and the actions I take on them are not important, because what I want to do seems fairly simple: just print out the program as it appears in the source file, using the rules I define for other purposes. But I'm surprised at how difficult doing so is.
First I tried adding printf("%s%s", $1, $2) to the second rule above. This produced "��#P�#". From what I understand, the parsed text is also available as a variable, yytext. I added printf("%s", yytext) to every rule in the file and added extern char* yytext; to the top of the file. This produced (null){void)1133331122222210101010--552222202020202222;;;;||||&&&&;;;;;;;;;;}}}}}}}} from a valid file according to the language's syntax. Finally, I changed extern char* yytext; to extern char yytext[], thinking it would not make a difference. The difference in output it made is best shown as a screenshot
I am using Bison 3.0.2 on Xubuntu 14.04.
If you just want to echo the source to some output while parsing it, it is easiest to do that in the lexer. You don't say what you ware using for a lexer, but you mention yytext, which is used by lex/flex, so I will assume that.
When you use flex to recognize tokens, the variable yytext refers to the internal buffer flex uses to recognize tokens. Within the action of a token, it can be used to get the text of the token, but only temporarily -- once the action completes and the next token is read, it will no longer be valid.
So if you have a flex rule like:
[a-zA-Z_][a-zA-Z_0-9]* { yylval.str = yytext, return T_ID; }
that likely won't work at all, as you'll have dangling pointers running around in your program; probably the source of the random-looking outputs you're seeing. Instead you need to make a copy. If you also want to output the input unchanged, you can do that here too:
[a-zA-Z_][a-zA-Z_0-9]* { yylval.str = strdup(yytext); ECHO; return T_ID; }
This uses the flex macro ECHO which is roughly equivalent to fputs(yytext, yyout) -- copying the input to a FILE * called yyout (which defaults to stdout)
If the first symbol in the corresponding right-hand side is a terminal, $1 in a bison action means "the value of yylval produced by the scanner when it returned the token corresponding to that terminal. If the symbol is a non-terminal, then it refers to the value assigned to $$ during the evaluation of the action which reduced that non-terminal. If there was no such action, then the default $$ = $1 will have been performed, so it will pass through the semantic value of the first symbol in the reduction of that non-terminal.
I apologize if all that was obvious, but your snippet is not sufficient to show:
what the semantic types are for each non-terminal;
what the semantic types are for each terminal;
what values, if any, are assigned to yylval in the scanner actions;
what values, if any, are assigned to $$ in the bison actions.
If any of those semantic types are not, in fact, character strings, then the printf will obviously produce garbage. (gcc might be able to warn you about this, if you compile the generated code with -Wall. Despite the possibility of spurious warnings if you are using old versions of flex/bison, I think it is always worthwhile compiling with -Wall and carefully reading the resulting warnings.)
Using yytext in a bison action is problematic, since it will refer to the text of the last token scanned, typically the look-ahead token. In particular, at the end of the input, yytext will be NULL, and that is what you will pick up in any reductions which occur at the end of input. glibc's printf implementation is nice enough to print (null) instead of segfaulting when your provide (char*)0 to an argument formated as %s, but I don't think it's a great idea to depend on that.
Finally, if you do have a char* semantic value, and you assign yylval = yytext (or yylval.sval = yytext; if you are using unions), then you will run into another problem, which is that yytext points into a temporary buffer owned by the scanner, and that buffer may have completely different contents by the time you get around to using the address. So you always need to make a copy of yytext if you want to pass it through to the parser.
If what you really want to do is see what the parser is doing, I suggest you enable bison's yydebug parser-trace feature. It will give you a lot of useful information, without requiring you to insert printf's into your bison actions at all.

Getting the string(s) output from an exec command in C++

My problem is pretty simple, but I can't seem to find anything straightforward or specific to what I am trying to do. I'm simply using execl to list the files in the current folder that follow the same pattern (ie, execl("ls nameOfFile*.txt")). What I want to do now is grab those file names so that I can loop through and get the data out of them. Is there a simple way of doing this? Am I using the correct exec?
Thanks for any help or tips.
The signature of execl is
int execl(const char *path, const char *arg, ...);
You're supposed to pass the path to the executable as the first argument, and arguments for the executable as the subsequent arguments, so your calling syntax is wrong. Even if you fix that, it still won't do what you want. The only way execl and friends ever return control to the calling program is if an error occurs. This answer contains an excellent explanation of what execl does.
You were probably thinking of std::system, which you can pass an arbitrary string to, and have the OS execute that command. While that'll print the filenames to stdout, it's still not what you want, because system returns an error code resulting from executing the command line you specified, it has no way of capturing and returning whatever may be written to stdout by the command.
Unfortunately, there is nothing in the C++ standard library (yet) that allows you to list and iterate files from the filesystem. The preferred cross platform approach is to use Boost.Filesystem. Otherwise, there are platform specific APIs available, which are listed in this answer, along with a Boost usage example.