Sometimes I am reading some code and would like to find the definition for a certain symbol, but it is sprinkled throughout the code to such an extent that grep is more or less insufficient for pointing me to its definition.
For example, I am working with Zlib and I want to figure out what FAR means.
Steven#Steven-PC /c/Users/Steven/Desktop/zlib-1.2.5
$ grep "FAR" * -R | wc -l
260
That's a lot to scan through. It turns out it is in fact #defined to nothing but it took me some time to figure it out.
If I was using Eclipse I would have it easy because I can just hover over the symbol and it will tell me what it is.
What kinds of tools out there can I use to analyze code in this way? Can GCC do this for me? clang maybe? I'm looking for something command-line preferably. Some kind of tool that isn't a full fledged IDE at any rate.
You may want to check out cscope, it's basically made for this, and a command line tool (if you like, using ncurses). Also, libclang (part of clang/llvm) can do so - but that's just a library (but took me just ~100 lines of python to use libclang to emulate basic cscope features).
cscope requires you to build a database first. libclang can parse code "live".
If the variable is not declared in your curernt file, it is declared in an included file, i.e. a .h. So you can limit the amount of data by performing a grep only on those files.
Moreover, you can filter whole word matches with -w option of grep.
Try:
grep -w "FAR" *.h -R | wc -l
Our Source Code Search Engine (SCSE) is kind of graphical grep that indexes a large code base according to the tokens of its language(s) (e.g., C, Java, COBOL, ...). Queries are stated in terms of the tokens, not strings, so finding an identifier won't find it in the middle of a comment. This minimizes false positives, and in a big code base these can be a serious waste of time. Found hits are displayed one per line; a click takes to the source text.
One can do queries from the command line and get grep-like responses, too.
A query of the form of
I=foo*
will find all uses of any identifier that starts with the letters "foo".
Queries can compose mulitiple tokens:
I=foo* '[' ... ']' '='
finds assignments to a subscripted foo ("..." means "near").
For C, Java and COBOL, the SCSE can find reads, writes, updates, and declarations of variables.
D=*baz
finds declarations of variables whose names end in "baz". I think this is what OP is looking for.
While SCSE works for C++, it presently can't find reads/writes/updates/declarations in C++. It does everything else.
The SCSE will handle mixed languages with aplomb. An "I" query will search across all langauges that have identifiers, so you can see cross language calls relatively easily, since the source and target identifiers tend to be the same for software engineering reasons.
gcc can output the pre-processing result, with all macro definitions with gcc -E -dD. The output file would be rather larger, often due to the nested system headers. But the first appearance of a symbol is usually the declaration (definition). The output use #line to show the part pre-processed result belong to source/header file, so you can find where it is originally declared.
To get the exact result when the file is compiled, you may need to add all other parameters used to compile the file, like -I, -D, etc. In fact, I always copy a result compilation command line, and add -E -dD to the beginning, and add (or change) -o in case I accidental overwrite anything.
There is gccxml, but I am not aware of tools that build on top of it. clang and LLVM are suited for such stuff, too; equally, I am not aware of standalone tools that build on them.
Apart from that: QtCreator and code::blocks can find the declartion, too.
So what is it about a "full fledged IDE" you don't want? If its a little speed, I found netbeans somewhat usefull when I was in school, but really for power and speed and general utility I would like to reccomend emacs. It has key board shortcuts for things like this. Keep in mind, its a learning curve to be sure, but once you are over the hump there is no going back.
Related
I'm looking for a way to search for a given term in a project's C/C++ code, while ignoring any occurrences in comments and strings.
As the code base is rather large, i am searching for a way to automatically identify the lines of code matching my search term, as they need manual inspection.
If possible I'd like to perform the search on my linux system.
background
the code base in question is a realtime signal processing engine with a large number of 3rd party plugins. plugins are implemented in a variety of languages (mostly C, but also C++ and others; currently I only care for those two), no standards have been enforced.
our code base currently uses the built-in type float for floating-point numbers and we would like to replace that with a typedef that would allow us to use doubles.
we would like to find all occurrences of float in the actual code (ignoring legit uses in comments and printouts).
What complicates things furthermore, is that there are some (albeit few) legit uses of float in the code payload (so we are really looking for a way to identify all places that require manual inspection, rather than run some automatic search-and-replace.)
the code also contains C-style static casts to (float), so relying on compiler warnings to identify type mismatches is often not an option.
the code base consists of more than 3000 (C and C++) files accumulating about 750000 lines of code.
the code is cross-platform (linux, osx, w32 being the main targets; but also freebsd and similar), and is compiled with the various native compilers (gcc/g++, clang/clang++, VisualStudio,...).
so far...
so far I'm using something ugly like:
grep "\bfloat\b" | sed -e 's|//.*||' -e 's|"[^"]*"||g' | grep "\bfloat\b"
but I'm thinking that there must be some better way to search only payload code.
IMHO there is a good answers on a similar question at "Unix & Linux":
grep works on pure text and does not know anything about the
underlying syntax of your C program. Therefore, in order not search
inside comments you have several options:
Strip C-comments before the search, you can do this using gcc
-fpreprocessed -dD -E yourfile.c For details, please see Remove comments from C/C++ code
Write/use some hacky half-working scripts like you have already found
(e.g. they work by skipping lines starting with // or /*) in order to
handle the details of all possible C/C++ comments (again, see the
previous link for some scary testcases). Then you still may have false
positives, but you do not have to preprocess anything.
Use more advanced tools for doing "semantic search" in the code. I
have found "coccigrep": http://home.regit.org/software/coccigrep/ This
kind of tools allows search for some specific language statements
(i.e. an update of a structure with given name) and certainly they
drop the comments.
https://unix.stackexchange.com/a/33136/158220
Although it doesn't completely cover your "not in strings" requirement.
It might practically depend upon the size of your code base, and perhaps also on the editor you are usually using. I am suggesting to use GNU emacs (if possible on Linux with a recent GCC compiler...)
For a small to medium size code (e.g. less than 300KLOC), I would suggest using the grep mode of Emacs. Then (assuming you have bound the next-error Emacs function to some key, perhaps with (global-set-key [f10] 'next-error) in your ~/.emacs...) you can quickly scan every occurrence of float (even inside strings or comments, but you'll skip very quickly such occurrences...). In a few hours you'll be done with a medium sized source code (and that is quicker than learning how to use a new tool).
For a large sized code (millions of lines), it might be worthwhile to customize some static analysis tool or compiler. You could use GCC MELT to customize your GCC compiler on Linux. Its findgimple mode could be inspirational, and perhaps even useful (you probably want to find all Gimple assignments targeting a float)
BTW, you probably don't want to replace all occurrences -but only most of them- of the float type with double (probably suitably typedef-ed...), because very probably you are using some external (or standard) functions requiring a float.
The CADNA tool might also be useful, to help you estimate the precision of results (so help you deciding when using double is sensible).
Using semantical tools like GCC MELT, CADNA, Coccinelle, Frama-C (or perhaps Fluctuat, or Coccigrep mentioned in g0hl1n's answer) would give more precise or relevant results, at the expense of having to spend more time (perhaps days!) in learning and customizing the tool.
The robust way to do this should be with cscope (http://cscope.sourceforge.net/) in line-oriented mode using the find this C symbol option but I haven't used that on a variety of C standards so if that doesn't work for you or if you can't get cscope then do this:
find . -type f -print |
while IFS= read -r file
do
sed 's/a/aA/g; s/__/aB/g; s/#/aC/g' "$file" |
gcc -P -E - |
sed 's/aC/#/g; s/aB/__/g; s/aA/a/g' |
awk -v file="$file" -v OFS=': ' '/\<float\>/{print file, $0}'
done
The first sed replaces all hash (#) and __ symbols with unique identifier strings, so that the preprocessor doesn't do any expansion of #include, etc. but we can restore them after preprocessing.
The gcc preprocesses the input to strip out comments.
The second sed replaces the hash-identifier string that we previously added with an actual hash sign.
The awk actually searches for float within word-boundaries and if found prints the file name plus the line it was found on. This uses GNU awk for word-boundaries \< and \>.
The 2nd sed's job COULD be done as part of the awk command but I like the symmetry of the 2 seds.
Unlike if you use cscope, this sed/gcc/sed/awk approach will NOT avoid finding false matches within strings but hopefully there's very few of those and you can weed them out while post-processing manually anyway.
It will not work for file names that contain newlines - if you have those you can put the body in a script and execute it as find .. -print0 | xargs -0 script.
Modify the gcc command line by adding whatever C or C++ version you are using, e.g. -ansi.
Examples found on the web for clang tools are always run on toy examples, which are usually all really trivial C programs.
I am building a tool which performs source-to-source transformations on C++ code, which is obviously a very, very challenging task, but clang is up to this task.
The issue I am facing now is that the AST that clang generates for any C++ code that utilizes the STL is enormous. For example I have some C++ code for which clang++ -ast-dump ... | wc -l is 67,018 lines of horrifying AST gobbledygook!
99% of this is standard library stuff, which I aim to ignore in my source-to-source metaprogramming task. So, to achieve this I want to simply filter out files. Suppose I want to look at only the class definitions in the headers of the project that I'm analyzing (and ignore all standard library headers's stuff), I will need to just figure out which header each of my CXXRecordDecl's came from!
Can this be done?
Edit: Hopefully this is a way to go about it. Trying this out now... The important bit is that it has to tell me the header that the decls came out of, not the cpp file corresponding to the translation unit.
In my experience so far, the "source" of some given AST node is best retrieved by using Locations. For example every node at least has a start location, and when you print this out it will contain the header file path.
Then it's possible to use this path to decide whether it is a system library or part of your application code that you still are interested in examining.
One route I'm looking at is to narrow matches with things like hasName() (as found here. For example:
recordDecl(hasName("MyBaseClass")) // etc.
However your comment above using -ast-dump is something I tried as well to get a lay of the land on my own CLang tool. I found this post to be extremely helpful. Armed with their suggestion, I used clang-check to filter to a specific class name and fed it my top-level CPP file. The output was a much more manageable few hundred lines representing the class declarations and definitions of interest.
I have a compiled program which I want to know if a certain line exist in it. Is there a way, using my source code, I could determine that?
Tony commented on my message so I'll add some info:
I'm using the g++ compiler.
I'm compiling the code on Linux(Scientific)/Unix machine
I only use standard library (nothing downloaded from the web)
The desired line is either multiplication by a number (in a subfunction of a while group) or printing a line in a specific case (if statement)
I need this becouse I'm running several MD simulations and sometimes I find my self in a situation where I'm not sure of the conditions.
objdump is a utility that can be used as a disassembler to view executable in assembly form.
Use this command to disassemble a binary,
objdump -Dslx file
Important to note though that disassemblers make use of the symbolic debugging information present in object files(ELF), So that information should be present in your object files. Also, constants & comments in source code will not be a part of the disassembled output.
Summary
Use source code control and keep track of which source code revision the executable's built from... it should write that into the output so you can always cross-reference the two, checkout the same sources and rebuild the executable that gave you those results etc..
Discussion
The desired line is either multiplication by a number (in a subfunction of a while group) or printing a line in a specific case (if statement)
I need this becouse I'm running several MD simulations and sometimes I find my self in a situation where I'm not sure of the conditions.
For the very simplest case where you want all the MD simulations to be running the latest source, you can compare timestamps on the source files with the executable to see if you forgot to recompile, compare the process start time (e.g. as listed by ps) with the executable creation time.
Where you're deliberately deploying multiple versions of the program and only have the latest source, then it gets pretty tricky. A multiplication will typically only generate a single machine code instruction... unless you have some contextual insight you're unlikely to know which multiplication is significant (or if it's missing). The compiler may generate its own multiplications for e.g. array indexing, and may sometimes optimise multiplications into bit shifts (or nothing, as Ira comments), so it's not as simple as saying 'well, it's my only multiplication in function "X"'. If you're printing a specific line that may be easier to distinguish... if there's a unique string literal you can search for it in the executable (e.g. puts("Hello") -> strings program | grep Hello, though that may get other matches too, and the compiler's allowed to reuse string literal sequences so "Well Hello" might cater to your need via a pointer to 'H' too). If there's a new extern symbol involved you might see it in nm output etc..
All that said (woah)... you should do something altogether different really. Best is to use a source control system (e.g. svn, cvs...), and get it configured so you can do something to find out which revision of the codebase was used to create the executable - it should be a FAQ for any revision control system.
Failing that, you could, for example, do something to print out what multipliers or conditions the progarm was using when it starts running, capturing that in your logs. While hackish, macros allow you to "stringify" their parameters, so you can log and execute something without typing all the code twice. Lots of other options too.
Hope some of that helps....
I am working on a rather large code base that has a bit of the #ifdef magic going on. I'm looking at one file and trying to determine where a type is defined. Unfortunately, it includes many file, which include many files, which include many files, etc. some of which define macros that affect which definitions you might use. The structure is sufficiently complicated that after 10 minutes worth of grepping and following the include chains, I still have no idea which definition is being used. I recall that visual studio has a nice feature where I can right click on the type and it will show where the type is defined. Is there an equivalent nice tool for linux that reads make files, etc? I'm sure there is, but I still just use vim + grep for my development environment.
With complicated defines and dependencies this feature doesn't always work in Visual Studio either.
Solution: ask your compiler to dump the code after it was preprocessed, ask it to print #line and #file directives too. Search through the resulted file for your type, then look at the closest #file directive to see where it came from.
(In GCC you can use the -E switch)
Many times when I am watching others code I just want to find where and how a variable is defined. Normally what I do now is look for the type of the variable until I find the definition, that is very time consuming. And I guess that there are some tools that can help me in this rutinary situation. Any suggestion in some tools or commands to help me in this task?.
I know that using a GUI and creating a project this is done automatically I am talking of a way to do this without a GUI. I am working with only text mode. I am running under Linux and I am using C/C++, but suggestions for other languages are welcome.
Thanks a lot.
A possible solution
Michel in one of his comments propose a simple an effective solution define again the variable, in that case in compilation time, the compiler will inform where is the previous definiton. Of course to apply this solution we need to think previously in the locality of the variable.
You've already given the most appropriate tool: an IDE. This is exactly the kind of thing which an IDE excels at. Why would you not want to use an IDE if you're finding development painful without one?
Note that Emacs, Vim etc can work as IDEs - I'm not talking about forcing you the world of GUIs if you want to stay in a text-only situation, e.g. because you're SSHing in.
(I'm really not trying to be rude here. I just think you've discounted the obvious solution without explaining why.)
Edit: OK, you say you're using C++. I'm editing my response. I would use the C preprocessor and then grep for the variable. It will appear in the first place.
cpp -I...(preprocessor options here) file.cpp | grep variable
The C preprocessor will join all the includes that the program uses, and the definition has to be before any usage of that variable in the file. Not a perfect thing, but without an IDE or a complete language description/managing tool, you only have the text.
Another option would be using ctags. It understands the C and C++ syntaxes (among others), and can be searched for variables and functions using command line tools, emacs and vi, among others.
I use cscope and ctags-exuberant religiously. Run it once on my code base and then in Vim, I can use various commands like ^] or [D or [I or similar to find any definitions or declarations for a given word.
This is similar to facilities provided by mega-IDEs like Visual Studio and Eclipse.
Cscope also functions as a stand-alone tool that performs these searches.
I use one of three methods:
I will use CTags to process my source tree (nightly) and then can easily use commands in Vim (or other editors) to jump right to the definition.
I will just use grep (linux) or findstr (windows) to look for all occurrences of the variable name or type. The definition is usually quite obvious.
In Vim, you can just search backward in the scope and often find what you are looking for.
Grep for common patterns for variable declarations. Example: *, &, > or an alphanumeric followed by one or more whitespace characters then the name of the variable. Or variable name followed by zero or more whitespace characters, then a left parenthesis or a semicolon. Unless it was defined under really weird circumstances (like with some kind of macro), it works every time.
In VIM you can use gd to see local variable declarations or gD to see global variable declarations, if they're defined in the current file. Reference Go_to_definition_using_g
You can also use [i to see the definition without jumping to it, or [I to see all occurrences of the variable in all the included files as well, which will naturally show the definition as well.
If you work in Microsoft Visual Studio (which I think you could use for C++ as well, but would require working on a Windows workstation) there's an easily accessible right-click menu option for "Go to Definition...", which will take you to the definition of any currently marked variable, type or method.
if you insist on staying text mode, you can do this with either emacs or vi with the appropriate plug-ins.
But really, move into the 21st century.
EDIT: You commented that you are doing this over SSH because you need the build speed of the remote server cluster.
In that case, mount the drive on your local machine and use an IDE, and just SSH in to kick off a build.