Using Bison-generated-compiler to compile source code - c++

Well, so far i'm using GNU Bison with Lex & Yacc files to build a parser in C++, which is called by my program through the yyparse() function. Therefore the g++ compilation of my program produce an .a file that allow the user to insert some code to be parsed.
However I would like to use the generated-file to compile a whole project directory hierarchy (i.e a bunch of files). So, is Bison able to generate the result-compiler in a independent archive to allow me that? Maybe there is a simple way to parse multiple files? Or Should I manage this behavior through C++ algorithms by myself?
Thanks for the knowledge sharing!

Bisons/yacc generated parsers do not directly read input. The parsers use the tokens extracted from the input stream by yylex(), leaving it entirely up to yylex() to read the data or otherwise access the input.
By default, the yylex() generated by (f)lex reads input from the input stream pointed at by the global variable yyin. yylex() does not fopen a file or otherwise give yyin a value (except for initialising it to stdin).
To pass multiple files through your parser:
Set yyin appropriately:
yyin = fopen(filepath, "r");
Call yyparse().
Close yyin.
Repeat as necessary.

Related

How to open a file ending with a particular extension in C++

I am trying to write a lexer using flex, and want to open and read from a file ending with a particular extension. E.g filename.k. I am only able to do it if I specify the file name as well as the extension.
FILE *myfile = fopen("a.k", "r");
if (!myfile) {
cout << "I can't open a.k!" << endl;
Can someone show me the way to open *.k files in C++.
I am running flex on Ubuntu. What I am trying to do is to run a flex program. The above code executes fine. I wanted a way where I can open a file with .k extension irrespective of the file name. Example. ./myprogram a.k or ./myprogram b.k. In the above example I always have to specify the file name in the code itself all the time.
Comment to Basile's anser:
[...] Such as ./myprogram a.k, I wanted a way where I can write any filename instead of a but ending with a .k extension.
While the cited answer technically is correct, I think your true problem is how to get some arbitrary, but specific file path from the command line:
Example: ./myprogram a.k or ./myprogram b.k
The thing is quite easy: you get the command line parameters passed directly to your main function, provided you use the variant accepting them:
int main(int argc, char* argv[]);
First parameter (argv[0]) is always the name of your programme (or an empty string, if not available), so argc will always be at least one. Afterwards the parameters provided follow, so invoking "./myprogram b.k" will result in argc being two and argv pointing to a char* array equivalent to the following:
char* argv[] =
{
"./myprogram",
"b.k",
nullptr // oh, yes, the array is always null terminated...
};
And then, the matter gets easy: Check, if the parameter is given at all: if(argc == 2) or, if you are willing to accept but ignore any additional parameters, if(argc >= 2) or simply if(argv[1]) (as it will be nullptr, if no parameter given, or the first parameter otherwise) and then use it for fopen or, if you prefer a more C++ like way, to open a std::ifstream. You might want to have additional checks, e. g. if the file name really ends with ".k", but that's up to you now...
Your fopen-ing code is good, but running in conditions (e.g. in some weird working directory, or without sufficient permissions) which make the fopen fail.
I recommend to use errno (perhaps implicitly thru perror) in that failure case to get an idea of the failure reason:
FILE *myfile = fopen("a.k", "r");
if (!myfile) {
perror("fopen of a.k");
exit(EXIT_FAILURE);
}
See e.g. fopen(3), perror(3), errno(3) (or their documentation for your particular implementation and system).
Notice that file extensions don't really exist in standard C++11 (but C++17 has filesystem). On Linux and POSIX systems, file extensions are just a convention.
Can someone show me the way to open *.k files in C++.
If you need to open all files with a .k extension, you may rely on globbing (on POSIX, run something like yourprog *.k in your shell, which will expand the *.k into a sequence of file names ending with .k before running your program, whose main would get an array of arguments; see glob(7)), or you have to loop explicitly using operating system primitives or functions (perhaps with glob(3), nftw(3), opendir(3), readdir(3), ... on Linux; for Windows, read about FindFirstFile etc...)
Standard C++11 don't provide a way to iterate on all files matching a given pattern. Some framework libraries (Boost, Poco, Qt) do provide such a way. Or you need to use operating system specific functions (e.g. to read the current directory. But directories are not known to C++11 and are an abstraction provided by your operating system). But C++17 has filesystem, but you need a very recent compiler and C++ standard library to get that.
BTW, on Unix or POSIX systems, you could have one single file named *.k. Of course that is very poor taste and should be avoided (but you might run touch '*.k' in your shell to make such a file).
Regarding your edit, for Linux, I recommend running
./myprogram *.k
(then your shell will expand *.k into one or several arguments to myprogram)
and code the main of your program myprog appropriately to iterate on arguments. See this.
If you want to run just myprogram without any additional arguments, you need to code the globbing or the expansion inside it. See glob(3), wordexp(3). Or scan directories (with opendir(3), readdir(3), closedir, stat(2) or nftw(3))

Accessing files made with mktemp for Linux through C++

I am trying to create a temporary file on a Linux system, but interfacing through C++ (so that the Linux commands are run through the C++ program).
To do so, I am using mktemp, which produces a temporary file.
I would need to later refer back to this file.
However, the filename is randomly generated and I am wondering if there is an easy way to access the filename.
The big honking comment in mktemp(3)'s manual page explicitly tells you to use mkstemp(3) instead of mktemp(3), and explains the good reason why it is so.
If you actually read the manual page for mkstemp(3) it clearly explains that the library function modifies the character buffer that's passed to it as a parameter to reflect the actual name of the created temporary file.
So to determine the name of the temporary file, simply refer to the character buffer you passed to this library function.

How to set the API from lex yacc to Program

I create an .exe FILE, which can parser an expression, which is generated by lex and yacc. But I do it just get the input from screen, and just return the parser result from screen. I saw some suggestions about using YY_BUFFER_STATE yy_scan_buffer(char *base, yy_size_t size), but I still could not find a good way to do it.
Is it possible that I put some headers (which is compiled by lex yacc) to my main program c++, and then I can use yylex() to call it, giving a string as input, and get the return value in the main program? Thanks for your help, I am confused about how to realize it. Thanks.
yy_scan_string is how you give flex a string as input. You call that first, and then call yylex and it will use that string as the input to get tokens from rather than stdin. When you get an EOF from yylex, it has scanned the entire string. You can then call yy_delete_buffer on the YY_BUFFER_STATE returned by yy_scan_string (to free up memory) and call yy_scan_string again if you want to scan a new string.
You can use yy_scan_buffer instead to save a bit of copying, but then you have to set up the buffer properly yourself (basically, it needs to end with two NUL bytes instead of just one).
Unfortunately, there's no standard header file from flex declaring these. So you need to either declare them yourself somewhere (copy the declarations from the flex documentation), or call them in the 3rd section of the .l file, which is copied verbatim to the end of the lex.y.c file.

How To Extract Function Name From Main() Function Of C Source

I just want to ask your ideas regarding this matter. For a certain important reason, I must extract/acquire all function names of functions that were called inside a "main()" function of a C source file (ex: main.c).
Example source code:
int main()
{
int a = functionA(); // functionA must be extracted
int b = functionB(); // functionB must be extracted
}
As you know, the only thing that I can use as a marker/sign to identify these function calls are it's parenthesis "()". I've already considered several factors in implementing this function name extraction. These are:
1. functions may have parameters. Ex: functionA(100)
2. Loop operators. Ex: while()
3. Other operators. Ex: if(), else if()
4. Other operator between function calls with no spaces. Ex: functionA()+functionB()
As of this moment I know what you're saying, this is a pain in the $$$... So please share your thoughts and ideas... and bear with me on this one...
Note: this is in C++ language...
You can write a Small C++ parser by combining FLEX (or LEX) and BISON (or YACC).
Take C++'s grammar
Generate a C++ program parser with the mentioned tools
Make that program count the funcion calls you are mentioning
Maybe a little bit too complicated for what you need to do, but it should certainly work. And LEX/YACC are amazing tools!
One option is to write your own C tokenizer (simple: just be careful enough to skip over strings, character constants and comments), and to write a simple parser, which counts the number of {s open, and finds instances of identifier + ( within. However, this won't be 100% correct. The disadvantage of this option is that it's cumbersome to implement preprocessor directives (e.g. #include and #define): there can be a function called from a macro (e.g. getchar) defined in an #include file.
An option that works for 100% is compiling your .c file to an assembly file, e.g. gcc -S file.c, and finding the call instructions in the file.S. A similar option is compiling your .c file to an object file, e.g, gcc -c file.c, generating a disassembly dump with objdump -d file.o, and searching for call instructions.
Another option is finding a parser using Clang / LLVM.
gnu cflow might be helpful

Generating code at compile-time using scripts

I would ideally like to be able to add (very repetitive) C/C++ code to my actual code, but at compile time, code which would come from say, the stdout of a python script, the same way one does with macros.
For example, let's say I want to have functions that depend on the public attributes of a given class, being able to just write the following in my C++ code would be a blessing:
generate_boring_functions(FooBarClass,"FooBarClass.cpp")
Is that feasible using conventional means? Or must I hack with Makefiles and temporary source files?
Thanks.
You do most likely need to tweak the Makefile a bit. It would be easy to write a (Python) script that reads each of your source files as an additional preprocessing step, replacing instances of generate_boring_functions (or any other script-macro) with the correct code, potentially just by invoking generate_boring_functions.py with the right arguments, and bypassing the need for temporary files by sending the source to the compiler over standard input.
Damn, now I want to make something like this.
Edit: A rule like this, stuck in a makefile, could be used to handle the extra build step. This is untested and added only for some shot at completeness.
%.o : %.cpp
python macros.py $< | g++ -x cpp -c - -o $#
If a makefile isn't conventional enough for you, you could get by with cleverly-written macros.
class FooBarClass
{
DEFINE_BORING_METHODS( FooBarClass )
/* interesting functions begin here */
}
I very frequently see this done to implement the boilerplate parts of COM classes.
But if you want something that's neither make nor macro, then I don't know what you could possibly mean.
A makefile (or equivalent) is a "conventional" means!
I've never used this particular technology, but it sounds as though you're looking for something like Ned Batchelder's Cog tool.
Python scripts are embedded into a C++ source file such that when run through the cog tool additional C++ code is generated for the C++ compiler to consume. So your build process would consist of an extra step to have cog produce the actual C++ source file before the C++ compiler is invoked.
You could try the Boost Preprocessor Library. It's just an extension of the regular preprocessor, but if you're creative, you can achieve nearly anything in it.
Did you have a look at PythoidC ? It can be used to generate C code.
I have encountered this exact same problem multiple times.
I use it exactly in the way you describe -- (i.e. to run "boringFunction( filename.cpp, "filename.cpp") for a set of files).
It is used to generate code that "registers" the code contained in a specific set of files to a std::map, to handle adding user-written functions to the library without dynamically recompiling the whole library or relying on the (likely novice programmer) user to write syntactically correct C++ code to e.g. implement class functions.
I have solved it in two ways (which are basically equivalent)
1) A purely C++ "bootstrapping" method, in which during compilation, make compiles a simple C++ program that generates the necessary files, and then calls a second makefile that compiles the actual code generated in the temporary files.
2) A shell based method that uses bash to accomplish the same thing (I.e. use simple shell commands to iterate through the files and output new files to a temporary location, then call make on the output).
The functions can either be output to one file each, or can be output to one monolithic file for the second compilation.
Then, the functions can either be loaded dynamically (i.e. they are compiled as a shared library), or I can recompile all the rest of the code with the generated functions included.
The only hard part was (a) figuring out a way to register the function names uniquely (e.g. using preprocessor __COUNTER__ only works if it is a single monolithic file), and (b) figuring out how to reliably call the generation function in the makefile before the main makefile runs.
The advantage of the pure-C++ method (versus e.g. bash) is that it could possibly work on systems that do not have the same bash linux shell by default (e.g. windows or macOS), in which case of course a more complex cmake method is necessary..
I have included the hard parts of the makefile for posterity:
The first makefile called is:
# Dummy to compile filters first
$(MAKECMDGOALS): SCRIPTCOMPILE
make -f Makefile2 $(MAKECMDGOALS)
SCRIPTCOMPILE:
#sh scripts/filter_compiler_single.sh filter_stubs
.PHONY: SCRIPTCOMPILE
Where scripts/filter_compilr_single.sh is e.g.:
BUILD_DIR="build/COMPILED_FILTERS";
rm -r $BUILD_DIR
mkdir -p $BUILD_DIR
ARGSET="( localmapdict& inputmaps, localmapdict& outputmaps, void*& userdata, scratchmats& scratch, const std::map<std::string,std::string>& params, const uint64_t& curr_time , const std::string& nickname, const std::string& desc )"
compfname=$BUILD_DIR"/COMPILED_FILTERS.cpp"
echo "//// START OF GENERATED FILE (this file will be overwritten!) ////" > $compfname #REV: first overwrites
echo "#include <salmap_rv/include/salmap_rv_filter_includes.hpp>" >> $compfname
echo "using namespace salmap_rv;" >> $compfname
flist=$(find $1 -maxdepth 1 -type f) #REV: add constraint to only find .cpp files?
for f in $flist;
do
compfnamebase=$(basename $f) #REV: includes .cpp
alg=${compfnamebase%.cpp}
echo $f " >> " $compfname
echo "void ""$alg""$ARGSET""{" >> $compfname
echo "DEBUGPRINTF(stdout, \"Inside algo funct "$alg"\");" >> $compfname; #REV: debug...
cat $f >> $compfname
echo "}""REGISTER_SAL_FILT_FUNC(""$alg"")" >> $compfname
done
echo "//// END OF GENERATED FILE ////" >> $compfname
The second makefile Makefile2 is the normal compilation instructions.
It is not beautiful, and I would love to find a better way to do it, but as it is, extracting even just the base filename from every file during compilation is difficult even using templates or constexpr (e.g. some macro function that takes __FILE__). And that would rely on the user remembering to add the specific macro call to their function filter stub, which is just adding extra unneccessary work and asking to introduce spelling errors etc.