Generating code at compile-time using scripts - c++

I would ideally like to be able to add (very repetitive) C/C++ code to my actual code, but at compile time, code which would come from say, the stdout of a python script, the same way one does with macros.
For example, let's say I want to have functions that depend on the public attributes of a given class, being able to just write the following in my C++ code would be a blessing:
generate_boring_functions(FooBarClass,"FooBarClass.cpp")
Is that feasible using conventional means? Or must I hack with Makefiles and temporary source files?
Thanks.

You do most likely need to tweak the Makefile a bit. It would be easy to write a (Python) script that reads each of your source files as an additional preprocessing step, replacing instances of generate_boring_functions (or any other script-macro) with the correct code, potentially just by invoking generate_boring_functions.py with the right arguments, and bypassing the need for temporary files by sending the source to the compiler over standard input.
Damn, now I want to make something like this.
Edit: A rule like this, stuck in a makefile, could be used to handle the extra build step. This is untested and added only for some shot at completeness.
%.o : %.cpp
python macros.py $< | g++ -x cpp -c - -o $#

If a makefile isn't conventional enough for you, you could get by with cleverly-written macros.
class FooBarClass
{
DEFINE_BORING_METHODS( FooBarClass )
/* interesting functions begin here */
}
I very frequently see this done to implement the boilerplate parts of COM classes.
But if you want something that's neither make nor macro, then I don't know what you could possibly mean.

A makefile (or equivalent) is a "conventional" means!

I've never used this particular technology, but it sounds as though you're looking for something like Ned Batchelder's Cog tool.
Python scripts are embedded into a C++ source file such that when run through the cog tool additional C++ code is generated for the C++ compiler to consume. So your build process would consist of an extra step to have cog produce the actual C++ source file before the C++ compiler is invoked.

You could try the Boost Preprocessor Library. It's just an extension of the regular preprocessor, but if you're creative, you can achieve nearly anything in it.

Did you have a look at PythoidC ? It can be used to generate C code.

I have encountered this exact same problem multiple times.
I use it exactly in the way you describe -- (i.e. to run "boringFunction( filename.cpp, "filename.cpp") for a set of files).
It is used to generate code that "registers" the code contained in a specific set of files to a std::map, to handle adding user-written functions to the library without dynamically recompiling the whole library or relying on the (likely novice programmer) user to write syntactically correct C++ code to e.g. implement class functions.
I have solved it in two ways (which are basically equivalent)
1) A purely C++ "bootstrapping" method, in which during compilation, make compiles a simple C++ program that generates the necessary files, and then calls a second makefile that compiles the actual code generated in the temporary files.
2) A shell based method that uses bash to accomplish the same thing (I.e. use simple shell commands to iterate through the files and output new files to a temporary location, then call make on the output).
The functions can either be output to one file each, or can be output to one monolithic file for the second compilation.
Then, the functions can either be loaded dynamically (i.e. they are compiled as a shared library), or I can recompile all the rest of the code with the generated functions included.
The only hard part was (a) figuring out a way to register the function names uniquely (e.g. using preprocessor __COUNTER__ only works if it is a single monolithic file), and (b) figuring out how to reliably call the generation function in the makefile before the main makefile runs.
The advantage of the pure-C++ method (versus e.g. bash) is that it could possibly work on systems that do not have the same bash linux shell by default (e.g. windows or macOS), in which case of course a more complex cmake method is necessary..
I have included the hard parts of the makefile for posterity:
The first makefile called is:
# Dummy to compile filters first
$(MAKECMDGOALS): SCRIPTCOMPILE
make -f Makefile2 $(MAKECMDGOALS)
SCRIPTCOMPILE:
#sh scripts/filter_compiler_single.sh filter_stubs
.PHONY: SCRIPTCOMPILE
Where scripts/filter_compilr_single.sh is e.g.:
BUILD_DIR="build/COMPILED_FILTERS";
rm -r $BUILD_DIR
mkdir -p $BUILD_DIR
ARGSET="( localmapdict& inputmaps, localmapdict& outputmaps, void*& userdata, scratchmats& scratch, const std::map<std::string,std::string>& params, const uint64_t& curr_time , const std::string& nickname, const std::string& desc )"
compfname=$BUILD_DIR"/COMPILED_FILTERS.cpp"
echo "//// START OF GENERATED FILE (this file will be overwritten!) ////" > $compfname #REV: first overwrites
echo "#include <salmap_rv/include/salmap_rv_filter_includes.hpp>" >> $compfname
echo "using namespace salmap_rv;" >> $compfname
flist=$(find $1 -maxdepth 1 -type f) #REV: add constraint to only find .cpp files?
for f in $flist;
do
compfnamebase=$(basename $f) #REV: includes .cpp
alg=${compfnamebase%.cpp}
echo $f " >> " $compfname
echo "void ""$alg""$ARGSET""{" >> $compfname
echo "DEBUGPRINTF(stdout, \"Inside algo funct "$alg"\");" >> $compfname; #REV: debug...
cat $f >> $compfname
echo "}""REGISTER_SAL_FILT_FUNC(""$alg"")" >> $compfname
done
echo "//// END OF GENERATED FILE ////" >> $compfname
The second makefile Makefile2 is the normal compilation instructions.
It is not beautiful, and I would love to find a better way to do it, but as it is, extracting even just the base filename from every file during compilation is difficult even using templates or constexpr (e.g. some macro function that takes __FILE__). And that would rely on the user remembering to add the specific macro call to their function filter stub, which is just adding extra unneccessary work and asking to introduce spelling errors etc.

Related

What's the best way to get a list of all the macros passed as compiler arguments?

I'm working on a code base that uses quite a bit of conditional compilation via macros passed as arguments to the compiler (i.e. gcc -DMACRO_HERE file.cpp). I would like to have a way to get a list of all the macros defined this way within the code so that I can write out all the used macros to the console and save files when the application is run, that way we know exactly what build was used.
I also need to do the same thing with the git hash but I think I can do that easily with a macro.
Edit 1: Note that this is not the same question as GCC dump preprocessor defines since I want the list available within the program and I only want the macros that are declared by being passed to the compiler with the -D argument
Edit 2: I also need it be cross compiler compatible since we use GCC, XL, CCE, CLANG, NVCC, HIP, and the MPI versions of those. Note that we're building with Make
Here's an outline of a possible solution.
The request is not well-specified because there is no guarantee that all object files will be built with the same conditional macros. So let's say that you want to capture the conditional macros specified for some designated source file.
On that basis, we can play a build trick, easy to do with make: the build recipe for that designated source file actually invokes a script, by inserting the path to the script at the beginning of the compile line.
The script runs through its arguments, selects the ones which start -D, and uses them to create a simple C source file which defines an array const char* build_options[], populating it with stringified versions of the command line arguments. (Unless you're a perfectionist, you don't need to do heroics to correctly escape the strings, because no sane build configuration would use -D arguments which require heroic escaping.)
Once the source file is built, the script saves it and either uses the command-line it was passed as its arguments to compile it, or leaves it to be compiled by some later build step.

parsing c++ / tracing void pointers

In my codebase, I ve got a lot function declarations with void-pointers as argument
void my_func(void* my_void_pointer)
I need to find all places in my sources
where my_func is called
(more importantly) with which type as argument.
For example calls like:
int* intpt=new int(10);
my_func(intpt);
or
char* charpt = new char('a');
my_func(charpt);
I need this because usually my_func does a reinterpret_cast to some self defined types and I would like to find out what possibly could go wrong if for example my byteorder changes.
I have already had a look at gcc_xml, but with this tool I can only find out which functions are defined with which arguments/argument types. Of course I could now grep the sources for function calls of such functions, but I still do not know with which types they are called with. Any idea which tool to start with?
Start with your compiler. Go and break the prototype and implementation of my_func by renaming it to Xmy_func (or any other change) and recompile... the compiler will tell you every place it's used.
Rename the argument to a non-pointer and recompile. You should get errors like cannot convert int* to int or cannot convert char* to int wherever your function is called.
You could write a small utility using Clang Tooling.
If you working in *nix terminal, you can try something like this:
// in project root folder
// you can replace *.cpp with *.h or *.hpp etc
for i in $(find . -type f -name "*.cpp"); do \
grep -Hn "my_func" $i; \
done;
Option 1. Use the following command to serach the occureneces of my_func in source directory.
grep "my_func(" *
Option 2. Use of source navigator. Open the source in source navigator and search the function name "my_func".
With a modern IDE, such as Eclipse CDT, you can search for all occurrences of each of your functions and explore call sites. Note however that Eclipse CDT doesn't appear to be able to distinguish overloads when searching, as its Java counterpart does.

Pass directory to C++ application from compiler

Some applications contain scripts that are run by the main application that reside in /usr/libexec. However, the autoconf scripts are able to change that directory by passing --libexecdir to the configure script.
For example, when running ./configure in the git source code, I can set --libexecdir to any directory I want, and the program will still work.
What do I need to add to a C++ to make this functionality work? In other words, how can I have a directory name set by a configure script compiled into the program?
You need the value of the #libexecdir# substitution variable (as used in e.g. Makefile.in) to be exposed to your C++ code. The simplest and most reliable way to do that is with a -D switch on the compiler command line for the object file that needs to know:
foo.o: CPPFLAGS += -DLIBEXECDIR='"$(libexecdir)"'
In foo.cc, LIBEXECDIR will then be a preprocessor macro expanding to a string constant that has the path you need. Two caveats, though: The above Makefile snippet uses a GNU make feature, target-specific variables. It will not work in other Make implementations. Also, I didn't bother quoting any characters in the expansion of $(libexecdir). Fully defensive quoting would look something like this:
foo.o: CPPFLAGS += \
-DLIBEXECDIR='"$(subst ",\",$(subst ','\'',$(subst \,\\,$(libexecdir))))"'
You will definitely need at least the innermost $(subst ...) construct if you want to be able to use Windows pathnames, with the slashes going the wrong way. People don't usually put ' or " in pathnames, so I probably wouldn't bother with the outer two until someone complained.
The same technique will work for any #whatever# substitution variable that isn't also an AC_DEFINE.
You might think you could use AC_DEFINE_UNQUOTED somehow to get the value of $(libexecdir) into config.h and so avoid all this mucking around with the command line. Unfortunately, Autoconf doesn't fully compute the value of its #*dir# substitutions at configure time:
# near the top of the generated 'configure':
exec_prefix=NONE
libexecdir='${exec_prefix}/libexec'
# much, much later -- as part of AC_OUTPUT:
test "x$exec_prefix" = xNONE && exec_prefix='${prefix}'
Therefore, if you do the obvious thing with AC_DEFINE_UNQUOTED, you will get something like
#define LIBEXECDIR "${exec_prefix}/libexec"
in your config.h. So that's not going to work, and I don't see a good way to make it work.

How To Extract Function Name From Main() Function Of C Source

I just want to ask your ideas regarding this matter. For a certain important reason, I must extract/acquire all function names of functions that were called inside a "main()" function of a C source file (ex: main.c).
Example source code:
int main()
{
int a = functionA(); // functionA must be extracted
int b = functionB(); // functionB must be extracted
}
As you know, the only thing that I can use as a marker/sign to identify these function calls are it's parenthesis "()". I've already considered several factors in implementing this function name extraction. These are:
1. functions may have parameters. Ex: functionA(100)
2. Loop operators. Ex: while()
3. Other operators. Ex: if(), else if()
4. Other operator between function calls with no spaces. Ex: functionA()+functionB()
As of this moment I know what you're saying, this is a pain in the $$$... So please share your thoughts and ideas... and bear with me on this one...
Note: this is in C++ language...
You can write a Small C++ parser by combining FLEX (or LEX) and BISON (or YACC).
Take C++'s grammar
Generate a C++ program parser with the mentioned tools
Make that program count the funcion calls you are mentioning
Maybe a little bit too complicated for what you need to do, but it should certainly work. And LEX/YACC are amazing tools!
One option is to write your own C tokenizer (simple: just be careful enough to skip over strings, character constants and comments), and to write a simple parser, which counts the number of {s open, and finds instances of identifier + ( within. However, this won't be 100% correct. The disadvantage of this option is that it's cumbersome to implement preprocessor directives (e.g. #include and #define): there can be a function called from a macro (e.g. getchar) defined in an #include file.
An option that works for 100% is compiling your .c file to an assembly file, e.g. gcc -S file.c, and finding the call instructions in the file.S. A similar option is compiling your .c file to an object file, e.g, gcc -c file.c, generating a disassembly dump with objdump -d file.o, and searching for call instructions.
Another option is finding a parser using Clang / LLVM.
gnu cflow might be helpful

Registering each C/C++ source file to create a runtime list of used sources

For a debugging and logging library, I want to be able to find, at runtime, a list of all of the source files that the project has compiled and linked. I assume I'll be including some kind of header in each source file, and the preprocessor __FILE__ macro can give me a character constant for that file, so I just need to somehow "broadcast" that information from each file to be gathered by a runtime function.
The question is how to elegantly do this, and especially if it can be done from C as opposed to C++. In C++ I'd probably try to make a class with a static storage to hold the list of filenames. Each header file would create a file-local static instance of that class, which on creation would append the FILE pointer or whatever into the class's static data members, perhaps as a linked list.
But I don't think this will work in C, and even in C++ I'm not sure it's guaranteed that each element will be created.
I wouldn't do that sort of thing right in the code. I would write a tool which parsed the project file (vcproj, makefile or even just scan the project directory for *.c* files) and generated an additional C source file which contained the names of all the source files in some kind of pre-initialized data structure.
I would then make that tool part of the build process so that every time you do a build this would all happen automatically. At run time, all you would have to do is read that data structure that was built.
I agree with Ferruccio, the best way to do this is in the build system, not the code itself. As an expansion of his idea, add a target to your build system which dumps a list of the files (which it has to know anyway) to a C file as a string, or array of strings, and compile this file into your source. This avoids a lot of complication in the source, and is expandable, if you want to add additional information, like the version number from your source code control system, who built the executable, etc.
There is a standard way on UNIX and Linux - ident. For every source file you create ID tag - usually it is assigned by you version control system, e.g. SVN keywords.
Then to find out the name and revision of each source file you just use ident command. If you need to do it at runtime check out how ident does it - source for it should be freely available.
Theres no way to do it in C. In C++ you can create a class like this:
struct Reg {
Reg( const char * file ) {
StaticDictionary::Register( file );
};
where StaticDictionary is a singleton container for all your file names. Then in each source file:
static Reg regthisfile( __FILE__ );
You would want to make the dictionary a Meyers singleton to avoid order of creation problems.
I don't think you can do this in the way you outline in a "passive" mode. That is, you are going to somehow run code for each source file to be added to the registry, it's hard to get it to happen automatically.
Of course, it's possible that you can make that code very unobtrusive using macros. It might be problematic for C source files that don't have an "entrypoint", so if your code isn't already organised as "modules", with e.g. an init() function for each module, it might be hard. Static initializing code might be possible, I'm not 100% sure if the order in which things are initialized creates problems here.
Using static storage in the registry module sounds like an excellent idea, a plain linked list or simple hash table should be easy enough to implement, if your project doesn't already include any general-purpose utility library.
In C++ your solution will work. It's guaranteed.
Edit: Just found out a solution in my head: Change a rule in your makefile to add
'-include "cfiles_register.h"' to each 'g++ file.cpp'.
%.o : %.cpp
$(CC) -include 'cfiles_register.h' -o $# $<
put your proposed in the question implemnatation to that 'cfiles_register.h'.
Using static instances in C++ would work fine.
You could do this also in C, but you need to use runtime specific features - for MSVC CRT take a look at http://www.codeguru.com/cpp/misc/misc/threadsprocesses/article.php/c6945/
For C - you could do it with a macro - define a variable named corresponding to your file, and then you could scan the symbols of your executable, just as an idea:
#define TRACK_FILE(name) char _file_tracker_##name;
use it in your my_c_file.c like this:
TRACK_FILE(my_c_file_c)
and than grep all file/variable names from the binary like this
nm my-binary | grep _file_tracker
Not really nice, but...
Horrible idea, I'm sure, but use a singleton. And on each file do something like
Singleton.register(__FILE__);
at global scope. It'll only work on cpp files though.
I did something like this years ago as a novice, and it worked. But I'd cringe to do it now. I'd add a build step now.
I agree with those who say that it is better to avoid doing this at run time, but in C, you can initialize a static variable with a function call, that is, in every file:
static int doesntmatter = register( __FILE__);