fortran modules -- finding where variables are defined/assigned - fortran

I am trying to extract a portion of a large fortran to make it its own program. A particular subroutine imports many modules (only two shown here as an example):
subroutine myroutine(aa,bb)
use xx_module
use yy_module
...
end subroutine myroutine
There are a lot of variables introduced in the ... portion that are imported from these modules. Is there a good way (or good tools) to find out which variables come from which module, and so on? Or I have to look through each module to see where each is defined, and then assigned (which may possibly occur in a different module...)?

On a UNIX/Linux system:
grep -ni "variable" filenames
is what I commonly do from a command line. Here, variable is the name of the variable you are looking for, filenames is name of the file (or more files) that you are searching through. This should give you insight right away about what variables come from what module. You can take on detective work from there. When in doubt, type "man grep".

SciTools Understand does, amongst many others, just that sort of thing.
Double click on a variable, takes you to the definition. Then search through
occurances.

In case you use eclipse, there is Photran, a plugin for working with Fortran projects. I don't use it myself, so I'm not 100 % sure, but I think it should be able to do what you require.

Related

Convenient way to find the declaration of a variable

Sometimes I am reading some code and would like to find the definition for a certain symbol, but it is sprinkled throughout the code to such an extent that grep is more or less insufficient for pointing me to its definition.
For example, I am working with Zlib and I want to figure out what FAR means.
Steven#Steven-PC /c/Users/Steven/Desktop/zlib-1.2.5
$ grep "FAR" * -R | wc -l
260
That's a lot to scan through. It turns out it is in fact #defined to nothing but it took me some time to figure it out.
If I was using Eclipse I would have it easy because I can just hover over the symbol and it will tell me what it is.
What kinds of tools out there can I use to analyze code in this way? Can GCC do this for me? clang maybe? I'm looking for something command-line preferably. Some kind of tool that isn't a full fledged IDE at any rate.
You may want to check out cscope, it's basically made for this, and a command line tool (if you like, using ncurses). Also, libclang (part of clang/llvm) can do so - but that's just a library (but took me just ~100 lines of python to use libclang to emulate basic cscope features).
cscope requires you to build a database first. libclang can parse code "live".
If the variable is not declared in your curernt file, it is declared in an included file, i.e. a .h. So you can limit the amount of data by performing a grep only on those files.
Moreover, you can filter whole word matches with -w option of grep.
Try:
grep -w "FAR" *.h -R | wc -l
Our Source Code Search Engine (SCSE) is kind of graphical grep that indexes a large code base according to the tokens of its language(s) (e.g., C, Java, COBOL, ...). Queries are stated in terms of the tokens, not strings, so finding an identifier won't find it in the middle of a comment. This minimizes false positives, and in a big code base these can be a serious waste of time. Found hits are displayed one per line; a click takes to the source text.
One can do queries from the command line and get grep-like responses, too.
A query of the form of
I=foo*
will find all uses of any identifier that starts with the letters "foo".
Queries can compose mulitiple tokens:
I=foo* '[' ... ']' '='
finds assignments to a subscripted foo ("..." means "near").
For C, Java and COBOL, the SCSE can find reads, writes, updates, and declarations of variables.
D=*baz
finds declarations of variables whose names end in "baz". I think this is what OP is looking for.
While SCSE works for C++, it presently can't find reads/writes/updates/declarations in C++. It does everything else.
The SCSE will handle mixed languages with aplomb. An "I" query will search across all langauges that have identifiers, so you can see cross language calls relatively easily, since the source and target identifiers tend to be the same for software engineering reasons.
gcc can output the pre-processing result, with all macro definitions with gcc -E -dD. The output file would be rather larger, often due to the nested system headers. But the first appearance of a symbol is usually the declaration (definition). The output use #line to show the part pre-processed result belong to source/header file, so you can find where it is originally declared.
To get the exact result when the file is compiled, you may need to add all other parameters used to compile the file, like -I, -D, etc. In fact, I always copy a result compilation command line, and add -E -dD to the beginning, and add (or change) -o in case I accidental overwrite anything.
There is gccxml, but I am not aware of tools that build on top of it. clang and LLVM are suited for such stuff, too; equally, I am not aware of standalone tools that build on them.
Apart from that: QtCreator and code::blocks can find the declartion, too.
So what is it about a "full fledged IDE" you don't want? If its a little speed, I found netbeans somewhat usefull when I was in school, but really for power and speed and general utility I would like to reccomend emacs. It has key board shortcuts for things like this. Keep in mind, its a learning curve to be sure, but once you are over the hump there is no going back.

Reverse engineering your own code c++

I have a compiled program which I want to know if a certain line exist in it. Is there a way, using my source code, I could determine that?
Tony commented on my message so I'll add some info:
I'm using the g++ compiler.
I'm compiling the code on Linux(Scientific)/Unix machine
I only use standard library (nothing downloaded from the web)
The desired line is either multiplication by a number (in a subfunction of a while group) or printing a line in a specific case (if statement)
I need this becouse I'm running several MD simulations and sometimes I find my self in a situation where I'm not sure of the conditions.
objdump is a utility that can be used as a disassembler to view executable in assembly form.
Use this command to disassemble a binary,
objdump -Dslx file
Important to note though that disassemblers make use of the symbolic debugging information present in object files(ELF), So that information should be present in your object files. Also, constants & comments in source code will not be a part of the disassembled output.
Summary
Use source code control and keep track of which source code revision the executable's built from... it should write that into the output so you can always cross-reference the two, checkout the same sources and rebuild the executable that gave you those results etc..
Discussion
The desired line is either multiplication by a number (in a subfunction of a while group) or printing a line in a specific case (if statement)
I need this becouse I'm running several MD simulations and sometimes I find my self in a situation where I'm not sure of the conditions.
For the very simplest case where you want all the MD simulations to be running the latest source, you can compare timestamps on the source files with the executable to see if you forgot to recompile, compare the process start time (e.g. as listed by ps) with the executable creation time.
Where you're deliberately deploying multiple versions of the program and only have the latest source, then it gets pretty tricky. A multiplication will typically only generate a single machine code instruction... unless you have some contextual insight you're unlikely to know which multiplication is significant (or if it's missing). The compiler may generate its own multiplications for e.g. array indexing, and may sometimes optimise multiplications into bit shifts (or nothing, as Ira comments), so it's not as simple as saying 'well, it's my only multiplication in function "X"'. If you're printing a specific line that may be easier to distinguish... if there's a unique string literal you can search for it in the executable (e.g. puts("Hello") -> strings program | grep Hello, though that may get other matches too, and the compiler's allowed to reuse string literal sequences so "Well Hello" might cater to your need via a pointer to 'H' too). If there's a new extern symbol involved you might see it in nm output etc..
All that said (woah)... you should do something altogether different really. Best is to use a source control system (e.g. svn, cvs...), and get it configured so you can do something to find out which revision of the codebase was used to create the executable - it should be a FAQ for any revision control system.
Failing that, you could, for example, do something to print out what multipliers or conditions the progarm was using when it starts running, capturing that in your logs. While hackish, macros allow you to "stringify" their parameters, so you can log and execute something without typing all the code twice. Lots of other options too.
Hope some of that helps....

Variable renaming for plagiarism detection for C/C++

I have a couple of simple C++ homeworks and I know the students shared code. These are smart students and they know how to cheat moss. I'm looking for a tool that can rename variables based on their types (first variable of type int will be int1, first int array will be intptr1...), or does something similar that I cannot think of now. Do you know a quick way to do this?
edit: I'm required to use moss and report 90% match
Thanks
Yep, the tool you're looking for is called a compiler. :)
Seriously, if the programs submitted are exactly the same except for the identifier names, compiling then (without debugging info) should result in exactly the same output.
If you do this with debugging turned on, the compiler may leave meta-data in the executable that is different for each executable, hence the comment about ensuring it is off. This is also why this wont work for Java programs - that kind of info is present whether in debug mode or not (for the purposes of dynamic introspection).
EDIT: I see from the comments added to the question that you're observing some submissions that are different in more than just identifier names. If the programs are still structurally equivalent, this should still work.
EDIT: Given that the use of moss is a requirement, this probably isn't the way to go. I does seem though that moss has some support for comparing assembly - perhaps compiling to assembler and submitting that to moss is an option (depending on what compiler you're using).
You can download and try our C CloneDR duplicate code detector. It finds duplicated code even when the variable names have been changed. Multiple changes in the same chunk are treated as just one; if they rename the varaibles consistenly everywhere, you'll get back a report of "one clone" with the precise variable subsitution.
You can try Copy Paste Detector with ignoreIdentifiers turned on. You can at least use it for a first pass before going to the effort of normalizing names for moss. Or, since the source is available, maybe you can get it to spit out its internal normalization of the code.
Another way of doing this would be to compile the applications and compare their binaries, so your examination is not limited to variable/function name changing.
An HEX editor can help you with that. I just tried ExamDiff (not free $) and I was happy with the result.

How do you handle command line options and config files?

What packages do you use to handle command line options, settings and config files?
I'm looking for something that reads user-defined options from the command line and/or from config files.
The options (settings) should be dividable into different groups, so that I can pass different (subsets of) options to different objects in my code.
I know of boost::program_options, but I can't quite get used to the API. Are there light-weight alternatives?
(BTW, do you ever use a global options object in your code that can be read from anywhere? Or would you consider that evil?)
At Google, we use gflags. It doesn't do configuration files, but for flags, it's a lot less painful than using getopt.
#include <gflags/gflags.h>
DEFINE_string(server, "foo", "What server to connect to");
int main(int argc, char* argv[]) {
google::ParseCommandLineFlags(&argc, &argv, true);
if (!server.empty()) {
Connect(server);
}
}
You put the DEFINE_foo at the top of the file that needs to know the value of the flag. If other files also need to know the value, you use DECLARE_foo in them. There's also pretty good support for testing, so unit tests can set different flags independently.
For command lines and C++, I've been a fan of TCLAP: Templatized Command Line Argument Parser.
http://sourceforge.net/projects/tclap/
Well, you're not going to like my answer. I use boost::program_options. The interface takes some getting used to, but once you have it down, it's amazing. Just make sure to do boatloads of unit testing, because if you get the syntax wrong you will get runtime errors.
And, yes, I store them in a singleton object (read-only). I don't think it's evil in that case. It's one of the few cases I can think of where a singleton is acceptable.
If Boost is overkill for you, GNU Gengetopt is probably, too, but IMHO, it's a fun tool to mess around with.
And, I try to stay away from global options objects, I prefer to have each class read its own config. Besides the whole "Globals are evil" philosophy, it tends to end up becoming an ever-growing mess to have all of your configuration in one place, and also it's harder to tell what configuration variables are being used where. If you keep the configuration closer to where it's being used, it's more obvious what each one is for, and easier to keep clean.
(As to what I use, personally, for everything recently it's been a proprietary command line parsing library that somebody else at my company wrote, but that doesn't help you much, unfortunately)
I've been using TCLAP for a year or two now, but randomly I stumbled across ezOptionParser. ezOptionParser doesn't suffer from "it shouldn't have to be this complex"-syndrome the same way that other option parsers do.
I'm pretty impressed so far and I'll likely be using it going forward, specifically because it supports config files. TCLAP is a more sophisticated library, but the simplicity and extra features from ezOptionParser is very compelling.
Other perks from its website include (as of 0.2.0):
Pretty printing of parsed inputs for debugging.
Auto usage message creation in three layouts (aligned, interleaved or staggered).
Single header file implementation.
Dependent only on STL.
Arbitrary short and long option names (dash '-' or plus '+' prefixes not required).
Arbitrary argument list delimiters.
Multiple flag instances allowed.
Validation of required options, number of expected arguments per flag, datatype ranges, user defined ranges, membership in lists and case for string lists.
Validation criteria definable by strings or constants.
Multiple file import with comments.
Exports to file, either set options or all options including defaults when available.
Option parse index for order dependent contexts.
GNU getopt is pretty nice. If you want a C++ feel, consider getoptpp which is a wrapper around the native getopt.
As far as configuration file is concerned, you should try to make it as stupid as possible so that parsing is easy. If you are bit considerate, you might want to use yaac&lex but that would be really a big bucks for small apps.
I also would like to suggest that you should support both config files and command line options in your application. Config files are better for those options which are to be changed less frequently. Command-line options are good when you want to pass the immediate changing arguments (typically when you are creating a app, which would be called by some other program.)
If you are working with Visual Studio 2005 on x86 and x64 Windows there is some good Command Line Parsing utilities in the SimpleLibPlus library. I have used it and found it very useful.
Not sure about command line argument parsing. I have not needed very rich capabilities in that area and have generally rolled my own to save adding more dependencies to my software. Depending upon what your needs are you may or may not want to try this route. The C++ programs I have written are generally not invoked from the command line.
On the other hand, for a config file you really can't beat an XML based format. It's readable, extensible, structured, etc... :) Plus there are lots of XML parsers out there. Despite the fact it is a C library, I tend to use libxml2 from xmlsoft.org.
Try Apache Ant. Its primary usage is Java projects, but there isn't anything Java about it, and its usable for almost anything.
Usage is fairly simple and you've got a lot of community support too. It's really good at doing things the way you're asking.
As for global options in code, I think they're quite necessary and useful. Don't misuse them, though.

Any program or trick to find the definition of a variable?

Many times when I am watching others code I just want to find where and how a variable is defined. Normally what I do now is look for the type of the variable until I find the definition, that is very time consuming. And I guess that there are some tools that can help me in this rutinary situation. Any suggestion in some tools or commands to help me in this task?.
I know that using a GUI and creating a project this is done automatically I am talking of a way to do this without a GUI. I am working with only text mode. I am running under Linux and I am using C/C++, but suggestions for other languages are welcome.
Thanks a lot.
A possible solution
Michel in one of his comments propose a simple an effective solution define again the variable, in that case in compilation time, the compiler will inform where is the previous definiton. Of course to apply this solution we need to think previously in the locality of the variable.
You've already given the most appropriate tool: an IDE. This is exactly the kind of thing which an IDE excels at. Why would you not want to use an IDE if you're finding development painful without one?
Note that Emacs, Vim etc can work as IDEs - I'm not talking about forcing you the world of GUIs if you want to stay in a text-only situation, e.g. because you're SSHing in.
(I'm really not trying to be rude here. I just think you've discounted the obvious solution without explaining why.)
Edit: OK, you say you're using C++. I'm editing my response. I would use the C preprocessor and then grep for the variable. It will appear in the first place.
cpp -I...(preprocessor options here) file.cpp | grep variable
The C preprocessor will join all the includes that the program uses, and the definition has to be before any usage of that variable in the file. Not a perfect thing, but without an IDE or a complete language description/managing tool, you only have the text.
Another option would be using ctags. It understands the C and C++ syntaxes (among others), and can be searched for variables and functions using command line tools, emacs and vi, among others.
I use cscope and ctags-exuberant religiously. Run it once on my code base and then in Vim, I can use various commands like ^] or [D or [I or similar to find any definitions or declarations for a given word.
This is similar to facilities provided by mega-IDEs like Visual Studio and Eclipse.
Cscope also functions as a stand-alone tool that performs these searches.
I use one of three methods:
I will use CTags to process my source tree (nightly) and then can easily use commands in Vim (or other editors) to jump right to the definition.
I will just use grep (linux) or findstr (windows) to look for all occurrences of the variable name or type. The definition is usually quite obvious.
In Vim, you can just search backward in the scope and often find what you are looking for.
Grep for common patterns for variable declarations. Example: *, &, > or an alphanumeric followed by one or more whitespace characters then the name of the variable. Or variable name followed by zero or more whitespace characters, then a left parenthesis or a semicolon. Unless it was defined under really weird circumstances (like with some kind of macro), it works every time.
In VIM you can use gd to see local variable declarations or gD to see global variable declarations, if they're defined in the current file. Reference Go_to_definition_using_g
You can also use [i to see the definition without jumping to it, or [I to see all occurrences of the variable in all the included files as well, which will naturally show the definition as well.
If you work in Microsoft Visual Studio (which I think you could use for C++ as well, but would require working on a Windows workstation) there's an easily accessible right-click menu option for "Go to Definition...", which will take you to the definition of any currently marked variable, type or method.
if you insist on staying text mode, you can do this with either emacs or vi with the appropriate plug-ins.
But really, move into the 21st century.
EDIT: You commented that you are doing this over SSH because you need the build speed of the remote server cluster.
In that case, mount the drive on your local machine and use an IDE, and just SSH in to kick off a build.