Find function/variable definition for a reference in source code - c++

My project contains many files.
Sometimes I need to know where a particular function is defined (implemented) in source code. What I currently do is text search within source files for the function name, which is very time consuming.
My question is: Is there a better way (compiler/linker flag) to find that function definition in source files?.... Since the linker has gone through all the trouble of resolving all these references already.
I am hoping for method better than stepping into a function call in debugger, since a function can be buried within many calls.

Try cscope utility.
From the manual:
Allows searching code for:
all references to a symbol
global definitions
functions called by a function
functions calling a function
text string
regular expression pattern
a file
files including a file
Curses based (text screen)
An information database is generated for faster searches and later reference
The fuzzy parser supports C, but is flexible enough to be useful for C++ and Java, and for use as a generalized 'grep database' (use it to browse large text documents!)
Has a command line mode for inclusion in scripts or as a backend to a GUI/frontend
Runs on all flavors of Unix, plus most monopoly-controlled operating systems.
A "screenshot":
C symbol: atoi
File Function Line
0 stdlib.h <global> 86 extern int atoi (const char *nptr);
1 dir.c makefilelist 336 dispcomponents = atoi(s);
2 invlib.c invdump 793 j = atoi(term + 1);
3 invlib.c invdump 804 j = atoi(term + 1);
4 main.c main 287 dispcomponents = atoi(s);
5 main.c main 500 dispcomponents = atoi(s);
6 stdlib.h atoi 309 int atoi (const char *nptr) __THROW
Find this C symbol:
Find this global definition:
Find functions called by this function:
Find functions calling this function:
Find this text string:
Change this text string:
Find this egrep pattern:
Find this file:
Find files #including this file:

If the symbol is exported, then you could wire up objdump or nm and look at the .o files. This is not useful for finding things in header files though.
My suggestion would be to put your project in git (which carries numerous other advantages) and use git grep which looks only at those files under git's revision control (meaning you don't grep object files and other irrelevances). git grep is also nice and quick.

Related

Identifying a Programming Language

So I have a software program that for reasons that are beyond this post, I will not include but to put it simply, I'd like to "MOD" the original software. The program is launched from a Windows Application named ViaNet.exe with accompanying DLL files such as ViaNetDll.dll. The Application is given an argument such as ./Statup.cat. There is also a WatchDog process that uses the argument ./App.cat instead of the former.
I was able to locate a log file buried in my Windows/Temp folder for the ViaNet.exe Application. Looking at the log it identifies files such as:
./Utility/base32.atc:_Encode32 line 67
./Utilities.atc:MemFun_:Invoke line 347
./Utilities.atc:_ForEachProperty line 380
./Cluster/ClusterManager.atc:ClusterManager:GetClusterUpdates line 1286
./Cluster/ClusterManager.atc:ClusterManager:StopSync line 505
./Cluster/ClusterManager.atc:ConfigSynchronizer:Update line 1824
Going to those file locations reveal files by those names, but not ending with .atc but instead .cat. The log also indicates some sort of Class, Method and Line # but .cat files are in binary form.
Searching the program folder for any files with the extension .atc reveals three -- What I can assume are uncompiled .cat files -- files. Low and behold, once opened it's obviously some sort of source code -- with copyright headers, lol.
global ConfigFolder, WriteConfigFile, App, ReadConfigFile, CreateAssocArray;
local mgrs = null;
local email = CreateAssocArray( null);
local publicConfig = ReadConfigFile( App.configPath + "\\publicConfig.dat" );
if ( publicConfig != null )
{
mgrs = publicConfig.cluster.shared.clusterGroup[1].managers[1];
local emailInfo = publicConfig.cluster.shared.emailServer;
if (emailInfo != null)
{
if (emailInfo.serverName != "")
{
email.serverName = emailInfo.serverName;
}
if (emailInfo.serverEmailAddress != "")
{
email.serverEmailAddress = emailInfo.serverEmailAddress;
}
if (emailInfo.adminEmailAddress != null)
{
email.adminEmailAddress = emailInfo.adminEmailAddress;
}
}
}
if (mgrs != null)
{
WriteConfigFile( ConfigFolder + "ZoneInfo.dat", mgrs);
}
WriteConfigFile( ConfigFolder + "EmailInfo.dat", email);
So to end this as simply as possible, I'm trying to find out two things. #1 What Programming Language is this? and #2 Can the .cat be decompiled back to .atc. files? -- and vice versa. Looking at the log it would appear that the Application is decoding/decompiling the .cat files already to interpret them verses running them as bytecode/natively. Searching for .atc on Google results in AutoCAD. But looking at the results, shows it to be some sort of palette files, nothing source code related.
It would seem to me that if I can program in this unknown language, let alone, decompile the existing stuff, I might get lucky with modding the software. Thanks in advance for any help and I really really hope someone has an answer for me.
EDIT
So huge news people, I've made quite an interesting discovery. I downloaded a patch from the vendor, it contained a batch file that was executing ViaNet.exe Execute [Patch Script].atc. I quickly discovered that you can use Execute to run both .atc and .cat files equally, same as with no argument. Once knowing this I assumed that there must be various arguments you can try, well after a random stroke of luck, there is one. That being Compile [Script].atc. This argument will compile also any .atc file to .cat. I've compiled the above script for comparison: http://pastebin.com/rg2YM8Q9
So I guess the goal now is to determine if it's possible to decompile said script. So I took a step further and was successful at obtaining C++ pseudo code from the ViaNet.exe and ViaNetDll.dll binaries, this has shed tons of understanding on the proprietary language and it's API they use. From what I can tell each execution is decompiled first then ran thru the interpreter. They also have nicknamed their language ATCL, still no idea what it stands for. While searching the API, I found several debug methods with names like ExecuteFile, ExecuteString, CompileFile, CompileString, InspectFunction and finally DumpObjCode. With the DumpObjCode method I'm able to perform some sort of dump of script files. Dump file for above script: http://pastebin.com/PuCCVMPf
I hope someone can help me find a pattern with the progress I made. I'm trying my best to go over the pseudo code but I don't know C++, so I'm having a really hard time understanding the code. I've tried to seperate what I can identify as being the compile script subroutines but I'm not certain: http://pastebin.com/pwfFCDQa
If someone can give me an idea of what this code snippet is doing and if it looks like I'm on the right path, I'd appreciate it. Thank you in advanced.

How to place a variable at the end of a section (with GCC)

I want to place a specific variable at the end of its memory section.
So if I have:
file1.cpp:
__attribute__((section(".mysection"))) char var1[] = "var1";
and in another file2.cpp:
__attribute__((section(".mysection"))) char var2[] = "var2";
How can I force var2 to be at the end of mysection?
Well, I ended up taking a whole different approach but i wanted to share my final conclusion here:
I base this on How to fetch the end address of my code
In the code, you must add an extern reference to the variable:
extern char var2[];
A linker script must be written as follows:
SECTIONS
{
.mysection : {
*(.mysection);
var2 = .;
}
}
INSERT AFTER .mysection
Add the linker script during the linkage (e.g ld -T <PATH_TO_MY_LINKER_SCRIPT>)
The INSERT AFTER part is used so my linker script would be added to the default linker script.
I had to use 'gold' to link my elf file and apparently the version I used doesn't support the 'INSERT AFTER' syntax. So the actual solution should be to copy the default linker script and just add my script information to it.
I haven't tested it though, but I still hope it can help someone.
You will have to create your own section in the linker command file and place your section appropriately in the variables section in the linker command file.
With most linkers, you cannot tell it the order of the exact variables. The easier solution is to create one section for each variable and tell the linker how you want the sections ordered.
Look up the exact syntax for the linker command file of the GNU compiler collection.

Compile a program with local file embedded as a string variable?

Question should say it all.
Let's say there's a local file "mydefaultvalues.txt", separated from the main project. In the main project I want to have something like this:
char * defaultvalues = " ... "; // here should be the contents of mydefaultvalues.txt
And let the compiler swap " ... " with the actual contents of mydefaultvalues.txt. Can this be done? Is there like a compiler directive or something?
Not exactly, but you could do something like this:
defaults.h:
#define DEFAULT_VALUES "something something something"
code.c:
#include "defaults.h"
char *defaultvalues = DEFAULT_VALUES;
Where defaults.h could be generated, or otherwise created however you were planning to do it. The pre-processor can only do so much. Making your files in a form that it will understand will make things much easier.
The trick I did, on Linux, was to have in the Makefile this line:
defaultvalues.h: defaultvalues.txt
xxd -i defaultvalues.txt > defaultvalues.h
Then you could include:
#include "defaultvalues.h"
There is defined both unsigned char defaultvalues_txt[]; with the contents of the file, and unsigned int defaultvalues_txt_len; with the size of the file.
Note that defaultvalues_txt is not null-terminated, thus, not considered a C string. But since you also have the size, this should not be a problem.
EDIT:
A small variation would allow me to have a null-terminated string:
echo "char defaultvalues[] = { " `xxd -i < defaultvalues.txt` ", 0x00 };" > defaultvalues.h
Obviously will not work very well if the null character is present inside the file defaultvalues.txt, but that won't happen if it is plain text.
One way to achieve compile-time trickery like this is to write a simple script in some interpreted programming language(e.g. Python, Ruby or Perl will do great) which does a simple search and replace. Then just run the script before compiling.
Define your own #pramga XYZ directive which the script looks for and replaces it with the code that declares the variable with file contents in a string.
char * defaultvalues = ...
where ... contains the text string read from the given text file. Be sure to compensate for line length, new lines, string formatting characters and other special characters.
Edit: lvella beat me to it with far superior approach - embrace the tools your environment supplies you. In this case a tool which does string search and replace and feed a file to it.
Late answer I know but I don't think any of the current answers address what the OP is trying to accomplish although zxcdw came really close.
All any 7 year old has to do is load your program into a hex editor and hit CTRL-S. If the text is in your executable code (or vicinity) or application resource they can find it and edit it.
If you want to prevent the general public from changing a resource or static data just encrypt it, stuff it in a resource then decrypt it at runtime. Try DES for something small to start with.

Tesseract multiple file confusion C++

I am trying to compile the Tesseract OCR code and have run into many problems. One is that the tessembeded.cpp function calls the "edges_and_textord" function and that other .cpp files call the "find_components" function. The "edges_and_textord" function is in the textord.cpp file that I downloaded from google, but the "find_components" function is not. However when I searched google for "textord.cpp" I found a completely different version of "textord.cpp" (here) with the "find_components" function in it. They both have identical commented header information at the very begining of the file (down to the date and time they were created).
So my question is, which one do I use? The tesseract code calls both of these functions so should I add the second "textord.cpp" file in under a different name?
I have run into the same problem with the "start_recog" function. The definition I have in my tface.cpp file is
"int Wordrec::start_recog(const char *textbase)"
but I have found another version of the file on Tesseract's website with the definition
"int start_recog(const char *configfile, const char *textbase)"
And tessembedded.cpp seems to call it using the second definition not found in the code I downloaded. Should I just replace what I downloaded with the second file?
Why are there these double files in the Tesseract code?

Parsing a C++ source file after preprocessing

I am trying to analyze c++ files using my custom made parser (written in c++). Before start parsing, I will like to get rid of all #define. I want the source file to be compilable after preprocessing. So best way will be to run C Preprocessor on the file.
cpp myfile.cpp temp.cpp
// or
g++ -E myfile.cpp > templ.cpp
[New suggestions are welcome.]
But due to this, the original lines and their line numbers will be lost as the file will contain all the header information also and I want to retain the line numbers. So the way out I have decided is,
Add a special symbol before
every line in the source file (except preprocessors)
Run the preprocessor
Extract the lines with that special
symbol and analyze them
For example, a typical source file will look like:
#include<iostream>
#include"xyz.h"
int x;
#define SOME value
/*
** This is a test file
*/
typedef char* cp;
void myFunc (int* i, ABC<int, X<double> > o)
{
//...
}
class B {
};
After adding symbol it will be like,
#include<iostream>
#include"xyz.h"
#3#int x;
#define SOME value
#5#/*
#6#** This is a test file
#7#*/
#8#typedef char* cp;
#9#
#10#void myFunc (int* i, ABC<int, X<double> > o)
#11#{
#12# //...
#13#}
#14#
#15#class B {
#16#};
Once all the macros and comments are removed, I will be left with thousands of line in which few hundred will be the original source code.
Is this approach correct ? Am I missing any corner case ?
You realize that g++ -E adds some of its own lines to its output which indicate line numbers in the original file? You'll find lines like
# 2 "foo.cc" 2
which indicate that you're looking at line 2 of file foo.cc . These lines are inserted whenever the regular sequence of lines is disrupted.
The imake program that used to come with X11 sources used a faintly similar system, marking the ends of lines with ## so that it could post-process them properly.
The output from gcc -E usually includes #line directives; you could perhaps use those instead of your symbols.