Can frama-c be used for header file analysis? - ocaml

I was looking at frama-c as a way to process C header files in OCaml (e.g. for generating language bindings). It's attractive because it seems like a very well-documented and maintained project. However, after a lot of googling and searching through the documentation, I can't find anything suitable for the purpose. Am I just missing the right way to do this, or is it outside the scope of frama-c? It seems like a fairly trivial thing for it to do, compared to some of the other plugins.

As Pascal said, I don't think that it is possible from the command line, but because you will have to write some code anyway, you can set the flag Rmtmps.keepUnused. This is a script that you can use to see the declarations :
let main () =
Rmtmps.keepUnused := true;
let file = File.from_filename "t.h" in
let () = File.init_from_c_files [ file ] in
let _ast = Ast.get () in
let show_function f =
let name = Kernel_function.get_name f in
if not (Cil.Builtin_functions.mem name) then
Format.printf "Function #[<2>%a:# #[#[type: %a#]# #[%s at %a#]#]#]#."
Kernel_function.pretty f
Cil_datatype.Typ.pretty (Kernel_function.get_type f)
(if Kernel_function.is_definition f then "defined" else "declared")
Cil.d_loc (Kernel_function.get_location f)
in Globals.Functions.iter show_function
let () = Db.Main.extend main
To run it, you have to use the -load-script option like this :
$ frama-c -load-script script.ml
Developing a plug-in will be more appropriate for more complex processing (see the Developer Manual for that), but a script make it easy to test.

In the current state, I would say that it is unfortunately impossible to use Frama-C to parse declarations of functions that are neither defined or used.
t.h:
int mybinding (int x, int y);
This gives you a view of the normalized AST. Normalized means that everything that could be simplified was:
$ frama-c -print t.h
[kernel] preprocessing with "gcc -C -E -I. t.h"
/* Generated by Frama-C */
And unfortunately, since mybinding was neither used nor defined, it was erased.
There is an option to keep declarations with specifications, but what you want is an option to keep all declarations. I have never noticed such an option:
$ frama-c -kernel-help
...
-keep-unused-specified-functions keep specified-but-unused functions (set by
default, opposite option is
-remove-unused-specified-functions)
And the option to keep functions with specifications does not do what you want:
$ frama-c -keep-unused-specified-functions -print t.h
[kernel] preprocessing with "gcc -C -E -I. t.h"
/* Generated by Frama-C */

Related

CPP/GPP in Fortran variadic macro (plus Fortran // concatenation)

I'm trying to compile a huge, world-renowned numerical weather prediction code - written mostly in Fortran 90 - that uses cpp extensively, and successfully, with PGI, Intel and gfortran. Now, I've inherited a version where experts have added several hundred cases of variadic macros. They use Intel and fpp, which is presumably a little more Fortran-centric, and can get it all to work. I need to use gfortran, and have not been able to get cpp to work on this code with its new additions.
A gross simplification of the problem is as follows -
Code to preprocess:
PRINT *, "Hello" // "Don"
#define adderv(...) (myadd(__VA_ARGS__))
sumv = adderv(1, 2, 3, 4, 5)
Using cpp without the -traditional option will handle the variadic macro, but not the Fortran concatenation:
$ cpp -P t.F90
PRINT *, "Hello"
sumv = (myadd(1, 2, 3, 4, 5))
On the other hand, using the -traditional flag handles the concatenation, but not the variadic macro:
$ cpp -P -traditional t.F90
t.F90:2:0: error: syntax error in macro parameter list
#define adderv(...) (myadd(__VA_ARGS__))
^
PRINT *, "Hello" // "Don"
sumv = adderv(1, 2, 3, 4, 5)
I'm really struggling to find a way to facilitate the processing of both.
I've started by playing with gpp, and feel like I'm getting close, but the reality is I might still be a long way from a solution. It doesn't accept the ... and, it doesn't expand __VA_ARGS__. Of course, the following isn't really a variadic macro any more...
PRINT *, "Hello" // "Don"
#define adderv() (myadd(__VA_ARGS__))
sumv = adderv(1, 2, 3, 4, 5)
$ gpp t.F90
PRINT *, "Hello" // "Don"
sumv = (myadd(__VA_ARGS__))
I've scoured the web to no avail, and the best possibility I've seen so far, which strikes me as possibly ugly and painful, is to split all my Fortran concatenation operators into separate lines. i.e.
PRINT *, "Hello" // "Don"
becomes
PRINT *, "Hello" /&
& / "Don"
The innards of cpp and gpp are a bit intimidating to me, but if anybody sees the potential for success and might point me in the right direction, I'd be very appreciative. Restructuring this huge code really isn't an option, though an automated strategy (such as splitting those concat operators into separate lines) might be, if I'm desperate enough.
Additional information - roygvib suggested I try adding the -C flag. We had been suppressing it lately because it seemed to introduce many C comments into the Fortran code. Well, I went ahead and tried this, and I think I'm closer:
$ cat t.f90
PRINT *, "Hello" // "Don"
#define adderv(...) (myadd(__VA_ARGS__))
sumv = adderv(1, 2, 3, 4, 5)
When I invoke with -P and -C flags, naturally it passes through the C++ (Fortran concat operator), but it also seems to generate some C-commented copyright text:
$ /lib/cpp -P -C t.F90
/* Copyright (C) 1991-2014 Free Software Foundation, Inc.
This file is part of the GNU C Library.
.
.
.
/* wchar_t uses ISO/IEC 10646 (2nd ed., published 2011-03-15) / Unicode 6.0. */
/* We do not support C11 <threads.h>. */
PRINT *, "Hello" // "Don"
sumv = (myadd(1, 2, 3, 4, 5))
A little bit of research ( Remove the comments generated by cpp ) is suggesting that this addition of the copyright may be a relatively new "feature" of cpp.
I can't see any simple way to suppress this, so I'm thinking I may need to build a wrapper script (e.g. mycpp) that calls cpp as above, filters out any C-style comments, then passes that to the next stage.
It's not optimal, and I'm a little leery because this whole package also has C code in it. Theoretically, though, I think that the worst thing that would happen would be failure to generate comments in preprocessed C code.
If anybody has knowledge as to how I might simply suppress the generation of that copyright message, I might be in business.
At least in the context of the simple example described below, I resolved the problem by installing an older cpp. Other research had confirmed that version 4.8 was inserting additional C comments into preprocessed Fortran code, which obviously isn't a good thing. The solution was simple, use cpp-4.7.
Installation (on Ubuntu 16.04) was more straightforward than I had anticipated. A simple
sudo apt-get install cpp-4.7
put the necessary executable in /usr/bin/cpp-4.7
and that preprocesses the following examples the way I want.
$ /usr/bin/cpp-4.7 -C -P t.F90
PRINT *, "Hello" // "Don"
sum = (myadd(1, 2, 3, 4, 5))
Like DonMorton, I am trying to use cpp -P with fortran files because __VA_ARGS__ and others things are used. As without -C option // are removed and comments are added.
So, I removed these extralines using the ideas of another answer :
cpp -P -C t.F90 | sed '/\/\*.*\*\// d; /\/\*/,/\*\// d'
And, I get, as expected :
PRINT *, "Hello" // "Don"
sumv = (myadd(1, 2, 3, 4, 5))
But, there is still a problem. You could not use // (C++ style comments) in macro args : // something is replaced by /* something */

How to find unreferenced classes in a codebase

We're in a period of development where there are a lot of code that is created which may be short-lived, as it's effectively scaffolding which at some point gets replaced with something else, but will often continue to exist and be forgotten about.
Are there any good techniques for finding the classes in a codebase that aren't used? Obviously there will be many false positives (eg library classes: you might not be using all the standard containers, but you want to know they're there), but if they were listed by directory then it may make it easier to see at a glance.
I could write a script that greps for all class XXX then searches again for all instances, but has to omit results for the cpp file that the class's methods were defined in. This would also be incredibly slow - O(N^2) for the number of classes in the codebase
Code coverage tools aren't really an option here as this is has a GUI that can't have all functions easily invoked programmatically.
Platforms are Visual Studio 2013 or Xcode/clang
EDIT: I don't believe this to be a duplicate of the dead code question. Although there is an overlap, identifying dead or unreachable code isn't quite the same as finding unreferenced classes.
If you're on linux, then you can use g++ to help you with this.
I'm going to assume that only when an instance of the class is created will we consider it as being used. Therefore, rather than looking just for the name of the class you could look for calls to the constructors.
struct A
{
A () { }
};
struct B
{
B () { }
};
struct C
{
C () { }
};
void bar ()
{
C c;
}
int main ()
{
B b;
}
On linux at least, running nm on the binary has the following mangled names:
00000000004005bc T _Z3barv
00000000004005ee W _ZN1BC1Ev
00000000004005ee W _ZN1BC2Ev
00000000004005f8 W _ZN1CC1Ev
00000000004005f8 W _ZN1CC2Ev
Immediately we can tell that none of the constructors for 'A' are called.
Using slightly modified information from this SO answer we can also get g++ to remove function call graphs that are not used:
Which results in:
00000000004005ba W _ZN1BC1Ev
00000000004005ba W _ZN1BC2Ev
So, on linux at least, you can tell that neither A nor C is required in the final executable.
I've come up with a simple shell script that will at least help to focus attention on the classes that are referenced the least. I've made the assumption that if a class isn't used then it's name will still appear in one or two files (declaration in the header and definition in the cpp file). So the script uses ctags to search for class declarations in a source directory. Then for each class it does a recursive grep to find all the files that mention the class (note: you can specify different class and usage directories), and finally it writes the file counts and class names to a file and displays them in numerical order. You can then review all the entries that only had 1 or 2 mentions.
#!/bin/bash
CLASSDIR=${1:-}
USAGEDIR=${2:-}
if [ "${CLASSDIR}" = "" -o "${USAGEDIR}" = "" ]; then
echo "Usage: find_unreferenced_classes.sh <classdir> <usagedir>"
exit 1
fi
ctags --recurse=yes --languages=c++ --c++-kinds=c -x $CLASSDIR | awk '{print $1}' | uniq > tags
[ -f "counts" ] && rm counts
for class in `cat tags`;
do
count=`grep -l -r $class $USAGEDIR --include=*.h --include=*.cpp | wc -l`
echo "$count $class" >> counts
done
sort -n counts
Sample output:
1 SomeUnusedClassDefinedInHeader
2 SomeUnusedClassDefinedAndDeclaredInHAndCppFile
10 SomeClassUsedLots

Find function/variable definition for a reference in source code

My project contains many files.
Sometimes I need to know where a particular function is defined (implemented) in source code. What I currently do is text search within source files for the function name, which is very time consuming.
My question is: Is there a better way (compiler/linker flag) to find that function definition in source files?.... Since the linker has gone through all the trouble of resolving all these references already.
I am hoping for method better than stepping into a function call in debugger, since a function can be buried within many calls.
Try cscope utility.
From the manual:
Allows searching code for:
all references to a symbol
global definitions
functions called by a function
functions calling a function
text string
regular expression pattern
a file
files including a file
Curses based (text screen)
An information database is generated for faster searches and later reference
The fuzzy parser supports C, but is flexible enough to be useful for C++ and Java, and for use as a generalized 'grep database' (use it to browse large text documents!)
Has a command line mode for inclusion in scripts or as a backend to a GUI/frontend
Runs on all flavors of Unix, plus most monopoly-controlled operating systems.
A "screenshot":
C symbol: atoi
File Function Line
0 stdlib.h <global> 86 extern int atoi (const char *nptr);
1 dir.c makefilelist 336 dispcomponents = atoi(s);
2 invlib.c invdump 793 j = atoi(term + 1);
3 invlib.c invdump 804 j = atoi(term + 1);
4 main.c main 287 dispcomponents = atoi(s);
5 main.c main 500 dispcomponents = atoi(s);
6 stdlib.h atoi 309 int atoi (const char *nptr) __THROW
Find this C symbol:
Find this global definition:
Find functions called by this function:
Find functions calling this function:
Find this text string:
Change this text string:
Find this egrep pattern:
Find this file:
Find files #including this file:
If the symbol is exported, then you could wire up objdump or nm and look at the .o files. This is not useful for finding things in header files though.
My suggestion would be to put your project in git (which carries numerous other advantages) and use git grep which looks only at those files under git's revision control (meaning you don't grep object files and other irrelevances). git grep is also nice and quick.

"Conditional" parsing of command-line arguments

Say I have an executable (running on mac, win, and linux)
a.out [-a] [-b] [-r -i <file> -o <file> -t <double> -n <int> ]
where an argument in [ ] means that it is optional. However, if the last argument -r is set then -i,-o,-t, and -n have to be supplied, too.
There are lots of good C++-libraries out there to parse command-line arguments, e.g. gflags (http://code.google.com/p/gflags/), tclap (http://tclap.sourceforge.net/), simpleopt(http://code.jellycan.com/simpleopt/), boost.program_options (http://www.boost.org/doc/libs/1_52_0/doc/html/program_options.html), etc. But I wondered if there is one that lets me encode these conditional relationships between arguments directly, w/o manually coding error handling
if ( argR.isSet() && ( ! argI.isSet() || ! argO.isSet() || ... ) ) ...
and manually setting up the --help.
The library tclap allows to XOR arguments, e.g. either -a or -b is allowed but not both. So, in that terminology an AND for arguments would be nice.
Does anybody know a versatile, light-weight, and cross-platform library that can do that?
You could two passes over the arguments; If -r is in the options you reset the parser and start over with the new mandatory options added.
You could also look into how the TCLAP XorHandler works, and create your own AndHandler.
You could change the argument syntax so that -r takes four values in a row.
I have part of the TCLAP snippet of code lying around that seems to fit the error handling portion that you're looking for, however it doesn't match exactly what you're looking for:
# include "tclap/CmdLine.h"
namespace TCLAP {
class RequiredDependentArgException : public ArgException {
public:
/**
* Constructor.
* \param text - The text of the exception.
* \param parentArg - The text identifying the parent argument source
* \param dependentArg - The text identifying the required dependent argument
* of the exception.
*/
RequiredDependentArgException(
const TCLAP::Arg& parentArg,
const TCLAP::Arg& requiredArg)
: ArgException(
std::string( "Required argument ") +
requiredArg.toString() +
std::string(" missing when the ") +
parentArg.toString() +
std::string(" flag is specified."),
requiredArg.toString())
{ }
};
} // namespace TCLAP
And then make use of the new exception after TCLAP::CmdLine::parse has been called:
if (someArg.isSet() && !conditionallyRequiredArg.isSet()) {
throw(TCLAP::RequiredDependentArgException(someArg, conditionallyRequiredArg));
}
I remember looking in to extending and adding an additional class that would handle this logic, but then I realized the only thing I was actually looking for was nice error reporting because the logic wasn't entirely straightforward and couldn't be easily condensed (at least, not in a way that was useful to the next poor guy who came along). A contrived scenario dissuaded me from pursuing it further, something to the effect of, "if A is true, B must be set but C can't be set if D is of value N." Expressing such things in native C++ is the way to go, especially when it comes time to do very strict argument checks at CLI arg parse time.
For truly pathological cases and requirements, create a state machine using something like Boost.MSM (Multi-State Machine). HTH.
do you want to parse a command line?you can use simpleopt,it can be used as followings:downLoad simpleopt from:
https://code.google.com/archive/p/simpleopt/downloads
test:
int _tmain(int argc, TCHAR * argv[])
argv can be:1.txt 2.txt *.cpp

SML-NJ, how to compile standalone executable

I start to learn Standard ML, and now I try to use Standard ML of New Jersey compiler.
Now I can use interactive loop, but how I can compile source file to standalone executable?
In C, for example, one can just write
$ gcc hello_world.c -o helloworld
and then run helloworld binary.
I read documentation for SML NJ Compilation Manager, but it don`t have any clear examples.
Also, is there another SML compiler (which allow standalone binary creating) available?
Both MosML and MLton also have the posibility to create standalone binary files. MosML through mosmlc command and MLton through the mlton command.
Note that MLton doesn't have an interactive loop but is a whole-program optimising compiler. Which in basic means that it takes quite some time to compile but in turn it generates incredibly fast SML programs.
For SML/NJ you can use the CM.mk_standalone function, but this is not advised in the CM User Manual page 45. Instead they recommend that you use the ml-build command. This will generate a SML/NJ heap image. The heap image must be run with the #SMLload parameter, or you can use the heap2exec program, granted that you have a supported system. If you don't then I would suggest that you use MLton instead.
The following can be used to generate a valid SML/NJ heap image:
test.cm:
Group is
test.sml
$/basis.cm
test.sml:
structure Test =
struct
fun main (prog_name, args) =
let
val _ = print ("Program name: " ^ prog_name ^ "\n")
val _ = print "Arguments:\n"
val _ = map (fn s => print ("\t" ^ s ^ "\n")) args
in
1
end
end
And to generate the heap image you can use: ml-build test.cm Test.main test-image and then run it by sml #SMLload test-image.XXXXX arg1 arg2 "this is one argument" where XXXXX is your architecture.
If you decide to MLton at some point, then you don't need to have any main function. It evaluates everything at toplevel, so you can create a main function and have it called by something like this:
fun main () = print "this is the main function\n"
val foo = 4
val _ = print ((Int.toString 4) ^ "\n")
val _ = main ()
Then you can compile it by mlton foo.sml which will produce an executable named "foo". When you run it, it will produce this as result:
./foo
4
this is the main function
Note that this is only one file, when you have multiple files you will either need to use MLB (ML Basis files) which is MLtons project files or you can use cm files and then compile it by mlton projectr.mlb