parsing c++ / tracing void pointers - c++

In my codebase, I ve got a lot function declarations with void-pointers as argument
void my_func(void* my_void_pointer)
I need to find all places in my sources
where my_func is called
(more importantly) with which type as argument.
For example calls like:
int* intpt=new int(10);
my_func(intpt);
or
char* charpt = new char('a');
my_func(charpt);
I need this because usually my_func does a reinterpret_cast to some self defined types and I would like to find out what possibly could go wrong if for example my byteorder changes.
I have already had a look at gcc_xml, but with this tool I can only find out which functions are defined with which arguments/argument types. Of course I could now grep the sources for function calls of such functions, but I still do not know with which types they are called with. Any idea which tool to start with?

Start with your compiler. Go and break the prototype and implementation of my_func by renaming it to Xmy_func (or any other change) and recompile... the compiler will tell you every place it's used.

Rename the argument to a non-pointer and recompile. You should get errors like cannot convert int* to int or cannot convert char* to int wherever your function is called.

You could write a small utility using Clang Tooling.

If you working in *nix terminal, you can try something like this:
// in project root folder
// you can replace *.cpp with *.h or *.hpp etc
for i in $(find . -type f -name "*.cpp"); do \
grep -Hn "my_func" $i; \
done;

Option 1. Use the following command to serach the occureneces of my_func in source directory.
grep "my_func(" *
Option 2. Use of source navigator. Open the source in source navigator and search the function name "my_func".

With a modern IDE, such as Eclipse CDT, you can search for all occurrences of each of your functions and explore call sites. Note however that Eclipse CDT doesn't appear to be able to distinguish overloads when searching, as its Java counterpart does.

Related

sed/regex - Updating source files with scope issues

I have a project originally written for Windows, and I am currently in the process of porting it over to Linux. Most of the platform specific code has been #ifdef'ed or wrapped, so it's been easy so far.
This project has about 2000 instances of gettext() scattered throughout about 200 source files (.cpp and .c compiled as C++). The intended function call is:
std::string boost::locale::gettext(const char*);
This works in Windows, but in Linux builds, it resolves to:
char * gettext (const char * msgid);
Which I assume it's resolving from <libintl.h>, which is interesting, since I'm not including it.
What I need to do is to do the following:
Find in all my source files (ignoring the .svn directories):
1.1. Lines containing gettext(.*).c_str() and modify them to become boost::locale::gettext(.*).c_str().
1.2. Lines containing gettext(.*) and modify them to become boost::locale::gettext(.*).c_str().
What's the best way to accomplish this, preferably using BASh and sed, or some command-line-fu in general? The requirements for 1.1 I could probably do easily enough, but 1.2 is a bit more complex, and I'm not sure how to have it know which right parentheses ) to append .c_str() to correctly.
Thank you.
This problem is not solvable with a regex in the general case, since you cannot find the matching closing parenthesis of the gettext()-call with it if other calls are nested in its argument list.
But if usually no nested calls are made, it might be an option to just fix these cases automatically and do the rest by hand.
This sed expression
sed -r "s/gettext\(([^()]*)\)(\.c_str\(\))?/boost::locale::gettext(\1).c_str()/g"
should leave invocations with nested calls untouched and replace the rest.

How To Extract Function Name From Main() Function Of C Source

I just want to ask your ideas regarding this matter. For a certain important reason, I must extract/acquire all function names of functions that were called inside a "main()" function of a C source file (ex: main.c).
Example source code:
int main()
{
int a = functionA(); // functionA must be extracted
int b = functionB(); // functionB must be extracted
}
As you know, the only thing that I can use as a marker/sign to identify these function calls are it's parenthesis "()". I've already considered several factors in implementing this function name extraction. These are:
1. functions may have parameters. Ex: functionA(100)
2. Loop operators. Ex: while()
3. Other operators. Ex: if(), else if()
4. Other operator between function calls with no spaces. Ex: functionA()+functionB()
As of this moment I know what you're saying, this is a pain in the $$$... So please share your thoughts and ideas... and bear with me on this one...
Note: this is in C++ language...
You can write a Small C++ parser by combining FLEX (or LEX) and BISON (or YACC).
Take C++'s grammar
Generate a C++ program parser with the mentioned tools
Make that program count the funcion calls you are mentioning
Maybe a little bit too complicated for what you need to do, but it should certainly work. And LEX/YACC are amazing tools!
One option is to write your own C tokenizer (simple: just be careful enough to skip over strings, character constants and comments), and to write a simple parser, which counts the number of {s open, and finds instances of identifier + ( within. However, this won't be 100% correct. The disadvantage of this option is that it's cumbersome to implement preprocessor directives (e.g. #include and #define): there can be a function called from a macro (e.g. getchar) defined in an #include file.
An option that works for 100% is compiling your .c file to an assembly file, e.g. gcc -S file.c, and finding the call instructions in the file.S. A similar option is compiling your .c file to an object file, e.g, gcc -c file.c, generating a disassembly dump with objdump -d file.o, and searching for call instructions.
Another option is finding a parser using Clang / LLVM.
gnu cflow might be helpful

Generating code at compile-time using scripts

I would ideally like to be able to add (very repetitive) C/C++ code to my actual code, but at compile time, code which would come from say, the stdout of a python script, the same way one does with macros.
For example, let's say I want to have functions that depend on the public attributes of a given class, being able to just write the following in my C++ code would be a blessing:
generate_boring_functions(FooBarClass,"FooBarClass.cpp")
Is that feasible using conventional means? Or must I hack with Makefiles and temporary source files?
Thanks.
You do most likely need to tweak the Makefile a bit. It would be easy to write a (Python) script that reads each of your source files as an additional preprocessing step, replacing instances of generate_boring_functions (or any other script-macro) with the correct code, potentially just by invoking generate_boring_functions.py with the right arguments, and bypassing the need for temporary files by sending the source to the compiler over standard input.
Damn, now I want to make something like this.
Edit: A rule like this, stuck in a makefile, could be used to handle the extra build step. This is untested and added only for some shot at completeness.
%.o : %.cpp
python macros.py $< | g++ -x cpp -c - -o $#
If a makefile isn't conventional enough for you, you could get by with cleverly-written macros.
class FooBarClass
{
DEFINE_BORING_METHODS( FooBarClass )
/* interesting functions begin here */
}
I very frequently see this done to implement the boilerplate parts of COM classes.
But if you want something that's neither make nor macro, then I don't know what you could possibly mean.
A makefile (or equivalent) is a "conventional" means!
I've never used this particular technology, but it sounds as though you're looking for something like Ned Batchelder's Cog tool.
Python scripts are embedded into a C++ source file such that when run through the cog tool additional C++ code is generated for the C++ compiler to consume. So your build process would consist of an extra step to have cog produce the actual C++ source file before the C++ compiler is invoked.
You could try the Boost Preprocessor Library. It's just an extension of the regular preprocessor, but if you're creative, you can achieve nearly anything in it.
Did you have a look at PythoidC ? It can be used to generate C code.
I have encountered this exact same problem multiple times.
I use it exactly in the way you describe -- (i.e. to run "boringFunction( filename.cpp, "filename.cpp") for a set of files).
It is used to generate code that "registers" the code contained in a specific set of files to a std::map, to handle adding user-written functions to the library without dynamically recompiling the whole library or relying on the (likely novice programmer) user to write syntactically correct C++ code to e.g. implement class functions.
I have solved it in two ways (which are basically equivalent)
1) A purely C++ "bootstrapping" method, in which during compilation, make compiles a simple C++ program that generates the necessary files, and then calls a second makefile that compiles the actual code generated in the temporary files.
2) A shell based method that uses bash to accomplish the same thing (I.e. use simple shell commands to iterate through the files and output new files to a temporary location, then call make on the output).
The functions can either be output to one file each, or can be output to one monolithic file for the second compilation.
Then, the functions can either be loaded dynamically (i.e. they are compiled as a shared library), or I can recompile all the rest of the code with the generated functions included.
The only hard part was (a) figuring out a way to register the function names uniquely (e.g. using preprocessor __COUNTER__ only works if it is a single monolithic file), and (b) figuring out how to reliably call the generation function in the makefile before the main makefile runs.
The advantage of the pure-C++ method (versus e.g. bash) is that it could possibly work on systems that do not have the same bash linux shell by default (e.g. windows or macOS), in which case of course a more complex cmake method is necessary..
I have included the hard parts of the makefile for posterity:
The first makefile called is:
# Dummy to compile filters first
$(MAKECMDGOALS): SCRIPTCOMPILE
make -f Makefile2 $(MAKECMDGOALS)
SCRIPTCOMPILE:
#sh scripts/filter_compiler_single.sh filter_stubs
.PHONY: SCRIPTCOMPILE
Where scripts/filter_compilr_single.sh is e.g.:
BUILD_DIR="build/COMPILED_FILTERS";
rm -r $BUILD_DIR
mkdir -p $BUILD_DIR
ARGSET="( localmapdict& inputmaps, localmapdict& outputmaps, void*& userdata, scratchmats& scratch, const std::map<std::string,std::string>& params, const uint64_t& curr_time , const std::string& nickname, const std::string& desc )"
compfname=$BUILD_DIR"/COMPILED_FILTERS.cpp"
echo "//// START OF GENERATED FILE (this file will be overwritten!) ////" > $compfname #REV: first overwrites
echo "#include <salmap_rv/include/salmap_rv_filter_includes.hpp>" >> $compfname
echo "using namespace salmap_rv;" >> $compfname
flist=$(find $1 -maxdepth 1 -type f) #REV: add constraint to only find .cpp files?
for f in $flist;
do
compfnamebase=$(basename $f) #REV: includes .cpp
alg=${compfnamebase%.cpp}
echo $f " >> " $compfname
echo "void ""$alg""$ARGSET""{" >> $compfname
echo "DEBUGPRINTF(stdout, \"Inside algo funct "$alg"\");" >> $compfname; #REV: debug...
cat $f >> $compfname
echo "}""REGISTER_SAL_FILT_FUNC(""$alg"")" >> $compfname
done
echo "//// END OF GENERATED FILE ////" >> $compfname
The second makefile Makefile2 is the normal compilation instructions.
It is not beautiful, and I would love to find a better way to do it, but as it is, extracting even just the base filename from every file during compilation is difficult even using templates or constexpr (e.g. some macro function that takes __FILE__). And that would rely on the user remembering to add the specific macro call to their function filter stub, which is just adding extra unneccessary work and asking to introduce spelling errors etc.

Registering each C/C++ source file to create a runtime list of used sources

For a debugging and logging library, I want to be able to find, at runtime, a list of all of the source files that the project has compiled and linked. I assume I'll be including some kind of header in each source file, and the preprocessor __FILE__ macro can give me a character constant for that file, so I just need to somehow "broadcast" that information from each file to be gathered by a runtime function.
The question is how to elegantly do this, and especially if it can be done from C as opposed to C++. In C++ I'd probably try to make a class with a static storage to hold the list of filenames. Each header file would create a file-local static instance of that class, which on creation would append the FILE pointer or whatever into the class's static data members, perhaps as a linked list.
But I don't think this will work in C, and even in C++ I'm not sure it's guaranteed that each element will be created.
I wouldn't do that sort of thing right in the code. I would write a tool which parsed the project file (vcproj, makefile or even just scan the project directory for *.c* files) and generated an additional C source file which contained the names of all the source files in some kind of pre-initialized data structure.
I would then make that tool part of the build process so that every time you do a build this would all happen automatically. At run time, all you would have to do is read that data structure that was built.
I agree with Ferruccio, the best way to do this is in the build system, not the code itself. As an expansion of his idea, add a target to your build system which dumps a list of the files (which it has to know anyway) to a C file as a string, or array of strings, and compile this file into your source. This avoids a lot of complication in the source, and is expandable, if you want to add additional information, like the version number from your source code control system, who built the executable, etc.
There is a standard way on UNIX and Linux - ident. For every source file you create ID tag - usually it is assigned by you version control system, e.g. SVN keywords.
Then to find out the name and revision of each source file you just use ident command. If you need to do it at runtime check out how ident does it - source for it should be freely available.
Theres no way to do it in C. In C++ you can create a class like this:
struct Reg {
Reg( const char * file ) {
StaticDictionary::Register( file );
};
where StaticDictionary is a singleton container for all your file names. Then in each source file:
static Reg regthisfile( __FILE__ );
You would want to make the dictionary a Meyers singleton to avoid order of creation problems.
I don't think you can do this in the way you outline in a "passive" mode. That is, you are going to somehow run code for each source file to be added to the registry, it's hard to get it to happen automatically.
Of course, it's possible that you can make that code very unobtrusive using macros. It might be problematic for C source files that don't have an "entrypoint", so if your code isn't already organised as "modules", with e.g. an init() function for each module, it might be hard. Static initializing code might be possible, I'm not 100% sure if the order in which things are initialized creates problems here.
Using static storage in the registry module sounds like an excellent idea, a plain linked list or simple hash table should be easy enough to implement, if your project doesn't already include any general-purpose utility library.
In C++ your solution will work. It's guaranteed.
Edit: Just found out a solution in my head: Change a rule in your makefile to add
'-include "cfiles_register.h"' to each 'g++ file.cpp'.
%.o : %.cpp
$(CC) -include 'cfiles_register.h' -o $# $<
put your proposed in the question implemnatation to that 'cfiles_register.h'.
Using static instances in C++ would work fine.
You could do this also in C, but you need to use runtime specific features - for MSVC CRT take a look at http://www.codeguru.com/cpp/misc/misc/threadsprocesses/article.php/c6945/
For C - you could do it with a macro - define a variable named corresponding to your file, and then you could scan the symbols of your executable, just as an idea:
#define TRACK_FILE(name) char _file_tracker_##name;
use it in your my_c_file.c like this:
TRACK_FILE(my_c_file_c)
and than grep all file/variable names from the binary like this
nm my-binary | grep _file_tracker
Not really nice, but...
Horrible idea, I'm sure, but use a singleton. And on each file do something like
Singleton.register(__FILE__);
at global scope. It'll only work on cpp files though.
I did something like this years ago as a novice, and it worked. But I'd cringe to do it now. I'd add a build step now.
I agree with those who say that it is better to avoid doing this at run time, but in C, you can initialize a static variable with a function call, that is, in every file:
static int doesntmatter = register( __FILE__);

Tools for finding unused function declarations?

Whilst refactoring some old code I realised that a particular header file was full of function declarations for functions long since removed from the .cpp file. Does anyone know of a tool that could find (and strip) these automatically?
You could if possible make a test.cpp file to call them all, the linker will flag the ones that have no code as unresolved, this way your test code only need compile and not worry about actually running.
PC-lint can be tunned for dedicated purpose:
I tested the following code against for your question:
void foo(int );
int main()
{
return 0;
}
lint.bat test_unused.cpp
and got the following result:
============================================================
--- Module: test_unused.cpp (C++)
--- Wrap-up for Module: test_unused.cpp
Info 752: local declarator 'foo(int)' (line 2, file test_unused.cpp) not referenced
test_unused.cpp(2) : Info 830: Location cited in prior message
============================================================
So you can pass the warning number 752 for your puropse:
lint.bat -"e*" +e752 test_unused.cpp
-e"*" will remove all the warnings and +e752 will turn on this specific one
If you index to code with Doxygen you can see from where is each function referenced. However, you would have to browse through each class (1 HTML page per class) and scan for those that don't have anything pointing to them.
Alternatively, you could use ctags to generate list of all functions in the code, and then use objdump or some similar tool to get list of all function in .o files - and then compare those lists. However, this can be problematic due to name mangling.
I don't think there is such thing because some functions not having a body in the actual source tree might be defined in some external library. This can only be done by creating a script which makes a list of declared functions in a header and verifies if they are sometimes called.
I have a C++ ftplugin for vim that is able is check and report unmatched functions -- vimmers, the ftplugin suite is not yet straightforward to install. The ftplugin is based on ctags results (hence its heuristic could be easily adapted to other environments), sometimes there are false positives in the case of inline functions.
HTH,
In addition Doxygen (#Milan Babuskov), you can see if there are warnings for this in your compiler. E.g. gcc has -Wunused-function for static functions; -fdump-ipa-cgraph.
I've heard good things about PC-Lint, but I imagine it's probably overkill for your needs.