How to find unreferenced classes in a codebase - c++

We're in a period of development where there are a lot of code that is created which may be short-lived, as it's effectively scaffolding which at some point gets replaced with something else, but will often continue to exist and be forgotten about.
Are there any good techniques for finding the classes in a codebase that aren't used? Obviously there will be many false positives (eg library classes: you might not be using all the standard containers, but you want to know they're there), but if they were listed by directory then it may make it easier to see at a glance.
I could write a script that greps for all class XXX then searches again for all instances, but has to omit results for the cpp file that the class's methods were defined in. This would also be incredibly slow - O(N^2) for the number of classes in the codebase
Code coverage tools aren't really an option here as this is has a GUI that can't have all functions easily invoked programmatically.
Platforms are Visual Studio 2013 or Xcode/clang
EDIT: I don't believe this to be a duplicate of the dead code question. Although there is an overlap, identifying dead or unreachable code isn't quite the same as finding unreferenced classes.

If you're on linux, then you can use g++ to help you with this.
I'm going to assume that only when an instance of the class is created will we consider it as being used. Therefore, rather than looking just for the name of the class you could look for calls to the constructors.
struct A
{
A () { }
};
struct B
{
B () { }
};
struct C
{
C () { }
};
void bar ()
{
C c;
}
int main ()
{
B b;
}
On linux at least, running nm on the binary has the following mangled names:
00000000004005bc T _Z3barv
00000000004005ee W _ZN1BC1Ev
00000000004005ee W _ZN1BC2Ev
00000000004005f8 W _ZN1CC1Ev
00000000004005f8 W _ZN1CC2Ev
Immediately we can tell that none of the constructors for 'A' are called.
Using slightly modified information from this SO answer we can also get g++ to remove function call graphs that are not used:
Which results in:
00000000004005ba W _ZN1BC1Ev
00000000004005ba W _ZN1BC2Ev
So, on linux at least, you can tell that neither A nor C is required in the final executable.

I've come up with a simple shell script that will at least help to focus attention on the classes that are referenced the least. I've made the assumption that if a class isn't used then it's name will still appear in one or two files (declaration in the header and definition in the cpp file). So the script uses ctags to search for class declarations in a source directory. Then for each class it does a recursive grep to find all the files that mention the class (note: you can specify different class and usage directories), and finally it writes the file counts and class names to a file and displays them in numerical order. You can then review all the entries that only had 1 or 2 mentions.
#!/bin/bash
CLASSDIR=${1:-}
USAGEDIR=${2:-}
if [ "${CLASSDIR}" = "" -o "${USAGEDIR}" = "" ]; then
echo "Usage: find_unreferenced_classes.sh <classdir> <usagedir>"
exit 1
fi
ctags --recurse=yes --languages=c++ --c++-kinds=c -x $CLASSDIR | awk '{print $1}' | uniq > tags
[ -f "counts" ] && rm counts
for class in `cat tags`;
do
count=`grep -l -r $class $USAGEDIR --include=*.h --include=*.cpp | wc -l`
echo "$count $class" >> counts
done
sort -n counts
Sample output:
1 SomeUnusedClassDefinedInHeader
2 SomeUnusedClassDefinedAndDeclaredInHAndCppFile
10 SomeClassUsedLots

Related

finding line number of end of a function

I am trying to automate some debug by printing inputs and outputs of a function via GDB, when that function hits. To enable setting breakpoints at these places, I am doing the following.
I am working with templates, and rbreak :. does not hit the breakpoints at the functions in my file. So i extract line numbers of functions from the executable as follows.
With the executable, extract the linenumber of start of a function;
nm a.out | grep "className" | grep "functionName" | grep " t " | addr2line -e a.out -f | grep "cpp" | uniq
-> this outputs the filename:linenumber
add these contents to a .gdb file with a "b prefix"
Query - how can we extract the line number of a end of a function from the executable ?
With this info, I can add it to the GDB script, the final script would
look something like below. this script would be loaded into GDB before
the execution of the program.
b filepath:<startline of function>
commands
print input1 input2 etc
continue
end
b filepath:<endline of function>
commands
print output1 output2 etc
continue
end
It remains to find only the end line of a given function belonging to a class/file, given the executable and start line of the function
I also considered using GDBs finish command but the control is back to the caller already.
it would be easy to have the prints within the called function instead of the caller, so that we can monitor input/outputs of every call of the function.
This will simplify my debug to a large extent.
Any suggestion/comments is highly appreciated.
Thanks a lot in advance !!
First, notice that template functions are not functions, but actually recipes. When you use a template the compiler generates a function from the template.
If you want to use the break command then you need the full function name. For instance, the template below
template <typename T>
inline T doubleInput(const T& x) {
return 2 * x;
}
will become the function doubleInput<int> when you pass an int, doubleInput<double> when you pass a double, etc. You need the whole name including the <type> part to add a breakpoint with the break command and even in that case it will only stop in that particular case of the template.
But the rbreak command does work with templates. If you write in gdb rbreak doubleInput* then a breakpoint will be added in all existing specializations of the template.
See the answer in this question.
I don't know if gdb nowadays has the feature to add a breakpoint in the return of a function, but answers in the nine years old question provide some possibilities, including a custom python command to find and add a breakpoint to the retq instructions or using reverse debugging. I haven't tried these options.

Wine check if WINEPREFIX is active(desktop=a/desktop=b/etc.) using c++

I have a c++ program I'm writing that uses wine to run Diablo II using the system() function calling a wine command.
(system("wine explorer /desktop=a,800x600 ~/Diablo/Diablo.exe"))
I have if/else statements runnning that command (as a condition) with /desktop=a, desktop=b, etc. in order to have multiple diablo windows running at the same time but my program is calling each one in order(all the way to desktop=g before exiting, so it's kind of annoying).
my question is: How can I test (I'm assuming with a wine argument) to see if desktop=a is active and if it's not run diablo in it, but if it is then move on to test desktop=b?
edit: This is what I have so far(goes up to desktop=g)
int main()
{
if (system("wine explorer /desktop=a,800x600 ~/Diablo/Diablo.exe"))
{
}
else if (system("wine explorer /desktop=b,800x600 ~/Diablo/Diablo.exe"))
{
}
edit 2: After some more research I did find this (bash):
result=`ps -Al | grep Game.exe | wc -l` && echo $result
that should work for my problem but I can't figure out how to call it from c++ in a way that allows me to pipe the output from it to a variable (not without creating a file and reading from that wich I'd like to try to avoid if possible).

How to track a recursive function's call stack usage

I am teaching a class to intro C++ students and I want to design a lab that shows how recursive functions differ from iteration. My idea was to track the memory/call stack usage for both and display the difference. I was almost positive that I had done something similar when I did my degree, but can't remember. My experience doesn't lie in C/C++ so any guidance would be appreciated.
Update 1:
I believe I may have miss represented my task. I had hoped to find a way to show how recursion increases the overhead/stack compared to iteration. I followed some suggested links and came up with the following script.
loops=100
counter=0
total1=0
echo "Iteration"
while [ $counter -lt $loops ]; do
"$1" & # Run the given command line in the background.
pid=$! peak1=0
echo -e "$counter.\c"
while true; do
#sleep 0.1
sample="$(pmap $pid | tail -n1 | sed 's/[^0-9]*//g' 2> /dev/null)" || break
if [ -z "$sample" ]; then
break
fi
let peak1='sample > peak1 ? sample : peak1'
done
# echo "Peak: $peak1" 1>&2
total1=$(expr $total1 + $peak1)
counter=$[$counter+1]
done
The program implements a binary search with either iteration or recursion. The idea is to get the average memory use and compare it to the Recursion version of the same program. This does not work as the iteration version often has a larger memory average than the recursion, which doesn't show to my students that recursion has drawbacks. Therefore I am pretty sure I am doing something incorrect.
Is pmap not going to provide me with what I want?
Something like this I think
void recursive(int* ptop) {
int dummy = 0;
printf("stack size %d\n",&dummy - ptop);
recursive(ptop);
}
void start() {
int dummy = 0;
recursive(&dummy);
}
until it will crash.
On any platform that knows them (Linux), backtrace(3), or even better backtrace_symbols(3) and their companions should be of great help.

Can frama-c be used for header file analysis?

I was looking at frama-c as a way to process C header files in OCaml (e.g. for generating language bindings). It's attractive because it seems like a very well-documented and maintained project. However, after a lot of googling and searching through the documentation, I can't find anything suitable for the purpose. Am I just missing the right way to do this, or is it outside the scope of frama-c? It seems like a fairly trivial thing for it to do, compared to some of the other plugins.
As Pascal said, I don't think that it is possible from the command line, but because you will have to write some code anyway, you can set the flag Rmtmps.keepUnused. This is a script that you can use to see the declarations :
let main () =
Rmtmps.keepUnused := true;
let file = File.from_filename "t.h" in
let () = File.init_from_c_files [ file ] in
let _ast = Ast.get () in
let show_function f =
let name = Kernel_function.get_name f in
if not (Cil.Builtin_functions.mem name) then
Format.printf "Function #[<2>%a:# #[#[type: %a#]# #[%s at %a#]#]#]#."
Kernel_function.pretty f
Cil_datatype.Typ.pretty (Kernel_function.get_type f)
(if Kernel_function.is_definition f then "defined" else "declared")
Cil.d_loc (Kernel_function.get_location f)
in Globals.Functions.iter show_function
let () = Db.Main.extend main
To run it, you have to use the -load-script option like this :
$ frama-c -load-script script.ml
Developing a plug-in will be more appropriate for more complex processing (see the Developer Manual for that), but a script make it easy to test.
In the current state, I would say that it is unfortunately impossible to use Frama-C to parse declarations of functions that are neither defined or used.
t.h:
int mybinding (int x, int y);
This gives you a view of the normalized AST. Normalized means that everything that could be simplified was:
$ frama-c -print t.h
[kernel] preprocessing with "gcc -C -E -I. t.h"
/* Generated by Frama-C */
And unfortunately, since mybinding was neither used nor defined, it was erased.
There is an option to keep declarations with specifications, but what you want is an option to keep all declarations. I have never noticed such an option:
$ frama-c -kernel-help
...
-keep-unused-specified-functions keep specified-but-unused functions (set by
default, opposite option is
-remove-unused-specified-functions)
And the option to keep functions with specifications does not do what you want:
$ frama-c -keep-unused-specified-functions -print t.h
[kernel] preprocessing with "gcc -C -E -I. t.h"
/* Generated by Frama-C */

"Conditional" parsing of command-line arguments

Say I have an executable (running on mac, win, and linux)
a.out [-a] [-b] [-r -i <file> -o <file> -t <double> -n <int> ]
where an argument in [ ] means that it is optional. However, if the last argument -r is set then -i,-o,-t, and -n have to be supplied, too.
There are lots of good C++-libraries out there to parse command-line arguments, e.g. gflags (http://code.google.com/p/gflags/), tclap (http://tclap.sourceforge.net/), simpleopt(http://code.jellycan.com/simpleopt/), boost.program_options (http://www.boost.org/doc/libs/1_52_0/doc/html/program_options.html), etc. But I wondered if there is one that lets me encode these conditional relationships between arguments directly, w/o manually coding error handling
if ( argR.isSet() && ( ! argI.isSet() || ! argO.isSet() || ... ) ) ...
and manually setting up the --help.
The library tclap allows to XOR arguments, e.g. either -a or -b is allowed but not both. So, in that terminology an AND for arguments would be nice.
Does anybody know a versatile, light-weight, and cross-platform library that can do that?
You could two passes over the arguments; If -r is in the options you reset the parser and start over with the new mandatory options added.
You could also look into how the TCLAP XorHandler works, and create your own AndHandler.
You could change the argument syntax so that -r takes four values in a row.
I have part of the TCLAP snippet of code lying around that seems to fit the error handling portion that you're looking for, however it doesn't match exactly what you're looking for:
# include "tclap/CmdLine.h"
namespace TCLAP {
class RequiredDependentArgException : public ArgException {
public:
/**
* Constructor.
* \param text - The text of the exception.
* \param parentArg - The text identifying the parent argument source
* \param dependentArg - The text identifying the required dependent argument
* of the exception.
*/
RequiredDependentArgException(
const TCLAP::Arg& parentArg,
const TCLAP::Arg& requiredArg)
: ArgException(
std::string( "Required argument ") +
requiredArg.toString() +
std::string(" missing when the ") +
parentArg.toString() +
std::string(" flag is specified."),
requiredArg.toString())
{ }
};
} // namespace TCLAP
And then make use of the new exception after TCLAP::CmdLine::parse has been called:
if (someArg.isSet() && !conditionallyRequiredArg.isSet()) {
throw(TCLAP::RequiredDependentArgException(someArg, conditionallyRequiredArg));
}
I remember looking in to extending and adding an additional class that would handle this logic, but then I realized the only thing I was actually looking for was nice error reporting because the logic wasn't entirely straightforward and couldn't be easily condensed (at least, not in a way that was useful to the next poor guy who came along). A contrived scenario dissuaded me from pursuing it further, something to the effect of, "if A is true, B must be set but C can't be set if D is of value N." Expressing such things in native C++ is the way to go, especially when it comes time to do very strict argument checks at CLI arg parse time.
For truly pathological cases and requirements, create a state machine using something like Boost.MSM (Multi-State Machine). HTH.
do you want to parse a command line?you can use simpleopt,it can be used as followings:downLoad simpleopt from:
https://code.google.com/archive/p/simpleopt/downloads
test:
int _tmain(int argc, TCHAR * argv[])
argv can be:1.txt 2.txt *.cpp