Global Variable Count - c++

How to count the number of global variables in C++ with a program that I can run with Grep?

A better method is to have your compiler print a map file. Most map files list all the global variables and their locations. If you're lucky, the map file may even indicate which translation unit the global variable belongs to.

Grep has no knowledge of the syntax or the grammar; it operates on lines. I don't think this is possible.
Here's a snippet of some code I'm working on:
int count;
Can you tell me if it's global?

You might be able to grep something in the artifacts of compilation such as listing files or object files.

Have you considered using something like cflow? You can get the GNU's version of cflow, the output, can then be greppable?
Hope this helps.

Related

Changing global variable names

I working on a huge code base written many years ago. We're trying to implement multi-threading and I'm incharge of cleaning up global variables (sigh!)
My strategy is to move all global variables to a class, and then individual threads will use instances of that class and the globals will be accessed through class instance and -> operator.
In first go, I've compiled a list of global variables using nm by finding B and D group object names. The list is not complete, and incase of static variables, I don't get file and line number info.
The second stage is even more messy, I've to replace all globals in the code base with classinstance->global_name pattern. I'm using cscope Change text string for this. The problem is that in case of some globals, their name is also being used locally inside functions, and thus cscope is replacing them as well.
Any other way to go about it? Any strategies, or help please!
just some suggestions, from my experience:
use eclipse: the C++ indexer is very good, and when dealing with a large project I find it very useful to track variables. shift+ctrl+g (I have forgotten how to access to it from menus!) let you search all the references, ctrl+alt+h (open call hierarchy) the caller-callee trees...
use eclipse: it has good refactoring tools, that is able to rename a variable without touching same-name-different-scope variables. (it often fails in case there are templates involved. I find it good, better than visual studio 2008 counterpart).
use eclipse: I know, it get some time to get started with it, but after you get it, it's very powerful. It can deal easily with the existing makefile based project (file -> new -> project -> makefile project with existing code).
I would consider not to use class members, but accessors: it's possibile that some of them will be shared among threads, and need some locking in order to be properly used. So I would prefer: classinstance->get_global_name()
As a final note, I don't know whether using the eclipse indexer at command-line would be helpful for your task. You can find some examples googling for it.
This question/answer can give you some more hints: any C/C++ refactoring tool based on libclang? (even simplest "toy example" ). In particular I do quote "...C++ is a bitch of a language to transform"
Halfway there: if a function uses a local name that hides the global name, the object file won't have an undefined symbol. nm can show you those undefined symbols, and then you know in which files you must replace at least some instances of that name.
However, you still have a problem in the rare cases that a file uses both the global name and in another function hides the global name. I'm not sure if this can be resolved with --ffunction-sections; but I think so: nm can show the section and thus you'll see the undefined symbols used in foo() appear in section .text.foo.

isDefined function?

In C++ is there any function that returns "true" when the variable is defined or false in vice versa. Something like this:
bool isDefined(string varName)
{
if (a variable called "varName" is defined)
return true;
else
return false;
}
C++ is not a dynamic language. Which means, that the answer is no. You know this at compile time, not runtime.
There is no such a thing in runtime as it doesn't make sense in a non-dynamic language as C++.
However you can use it inside a sizeof to test if it exists on compile time without side-effects.
(void)sizeof(variable);
That will stop compilation if var doesn't exist.
As already stated, the C++ runtime system does not support the querying of whether or not a variable is declared or not. In general a C++ binary doesn't contain information on variable symbols or their mappings to their location. Technically, this information would be available in a binary compiled with debugging information, and you could certainly query the debugging information to see if a variable name is present at a given location in code, but it would be a dirty hack at best (If you're curious to see what it might look at, I posted a terrible snippet # Call a function named in a string variable in C which calls a C function by a string using the DWARF debugging information. Doing something like this is not advised)
Microsoft has two extensions to C++ named: __if_exists and __if_not_exists. They can be useful in some cases, but they don't take string arguments.
If you really need such a functionality you can add all your variables to a set and then query that set for variable existance.
Already mentioned that C++ doesn't provide such facility.
On the other hand there are cases where the OS implement mechanisms close to isDefined(),
like the GetProcAddress Function, on Windows.
No. It's not like you have a runtime system around C++ which keeps remembers variables with names in some sort of table (meta data) and lets you access a variable through a dynamically generated string. If you want this, you have to build it yourself, for example using a std::map that maps strings to some objects.
Some compile-time mechanism would fit into the language. But I don't think that it would be any useful.
In order to achieve this first you need to implement a dynamic variable handling system, or at least find some on the internet. As previously mentioned the C++ is designed to be a native language so there are no built-in facilities to do this.
What I can suggest for the most easy solution to create a std::map with string keys storing global variables of interest with a boost::any, wxVariant or something similar, and store your variables in this map. You can make your life a bit easier with a little preprocessor directive to define a variables by their name, so you don't need to retype the name of the variable twice. Also, to make life easier I suggest to create a little inline function which access this variable map, and checks if the given string key is contained by the map.
There are implementation such a functionality in many places, the runtime property handling systems are available in different fashion, but if you need just this functionality I suggest to implement by yourself, because most of these solutions are quite general what you probably don't need.
You can make such function, but it wouldn't operate strings. You would have to send variable name. Such a function would try to add 0 to the variable. If it doesn't exists, an error would occur, so you might want to try to make exception handling with try...throw...catch . But because I'm on the phone, I don't know if this wouldn't throw an error anyways when trying to send non-existing variable to the function...

finding a function name and counting its LOC

So you know off the bat, this is a project I've been assigned. I'm not looking for an answer in code, but more a direction.
What I've been told to do is go through a file and count the actual lines of code while at the same time recording the function names and individual lines of code for the functions. The problem I am having is determining a way when reading from the file to determine if the line is the start of a function.
So far, I can only think of maybe having a string array of data types (int, double, char, etc), search for that in the line and then search for the parenthesis, and then search for the absence of the semicolon (so i know it isn't just the declaration of the function).
So my question is, is this how I should go about this, or are there other methods in which you would recommend?
The code in which I will be counting will be in C++.
Three approaches come to mind.
Use regular expressions. This is fairly similar to what you're thinking of. Look for lines that look like function definitions. This is fairly quick to do, but can go wrong in many ways.
char *s = "int main() {"
is not a function definition, but sure looks like one.
char
* /* eh? */
s
(
int /* comment? // */ a
)
// hello, world /* of confusion
{
is a function definition, but doesn't look like one.
Good: quick to write, can work even in the face of syntax errors; bad: can easily misfire on things that look like (or fail to look like) the "normal" case.
Variant: First run the code through, e.g., GNU indent. This will take care of some (but not all) of the misfires.
Use a proper lexer and parser. This is a much more thorough approach, but you may be able to re-use an open source lexer/parsed (e.g., from gcc).
Good: Will be 100% accurate (will never misfire). Bad: One missing semicolon and it spews errors.
See if your compiler has some debug output that might help. This is a variant of (2), but using your compiler's lexer/parser instead of your own.
Your idea can work in 99% (or more) of the cases. Only a real C++ compiler can do 100%, in which case I'd compile in debug mode (g++ -S prog.cpp), and get the function names and line numbers from the debug information of the assembly output (prog.s).
My thoughts for the 99% solution:
Ignore comments and strings.
Document that you ignore preprocessor directives (#include, #define, #if).
Anything between a toplevel { and } is a function body, except after typedef, class, struct, union, namespace and enum.
If you have a class, struct or union, you should be looking for method bodies inside it.
The function name is sometimes tricky to find, e.g. in long(*)(char) f(int); .
Make sure your parser works with template functions and template classes.
For recording function names I use PCRE and the regex
"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"
and then filter out names like "if", "while", "do", "for", "switch". Note that the function name is (\w+), group 1.
Of course it's not a perfect solution but a good one.
I feel manually doing the parsing is going to be a quite a difficult task. I would probably use a existing tool such as RSM redirect the output to a csv file (assuming you are on windows) and then parse the csv file to gather the required information.
Find a decent SLOC count program, eg, SLOCCounter. Not only can you count SLOC, but you have something against which to compare your results. (Update: here's a long list of them.)
Interestingly, the number of non-comment semicolons in a C/C++ program is a decent SLOC count.
How about writing a shell script to do this? An AWK program perhaps.

Registering each C/C++ source file to create a runtime list of used sources

For a debugging and logging library, I want to be able to find, at runtime, a list of all of the source files that the project has compiled and linked. I assume I'll be including some kind of header in each source file, and the preprocessor __FILE__ macro can give me a character constant for that file, so I just need to somehow "broadcast" that information from each file to be gathered by a runtime function.
The question is how to elegantly do this, and especially if it can be done from C as opposed to C++. In C++ I'd probably try to make a class with a static storage to hold the list of filenames. Each header file would create a file-local static instance of that class, which on creation would append the FILE pointer or whatever into the class's static data members, perhaps as a linked list.
But I don't think this will work in C, and even in C++ I'm not sure it's guaranteed that each element will be created.
I wouldn't do that sort of thing right in the code. I would write a tool which parsed the project file (vcproj, makefile or even just scan the project directory for *.c* files) and generated an additional C source file which contained the names of all the source files in some kind of pre-initialized data structure.
I would then make that tool part of the build process so that every time you do a build this would all happen automatically. At run time, all you would have to do is read that data structure that was built.
I agree with Ferruccio, the best way to do this is in the build system, not the code itself. As an expansion of his idea, add a target to your build system which dumps a list of the files (which it has to know anyway) to a C file as a string, or array of strings, and compile this file into your source. This avoids a lot of complication in the source, and is expandable, if you want to add additional information, like the version number from your source code control system, who built the executable, etc.
There is a standard way on UNIX and Linux - ident. For every source file you create ID tag - usually it is assigned by you version control system, e.g. SVN keywords.
Then to find out the name and revision of each source file you just use ident command. If you need to do it at runtime check out how ident does it - source for it should be freely available.
Theres no way to do it in C. In C++ you can create a class like this:
struct Reg {
Reg( const char * file ) {
StaticDictionary::Register( file );
};
where StaticDictionary is a singleton container for all your file names. Then in each source file:
static Reg regthisfile( __FILE__ );
You would want to make the dictionary a Meyers singleton to avoid order of creation problems.
I don't think you can do this in the way you outline in a "passive" mode. That is, you are going to somehow run code for each source file to be added to the registry, it's hard to get it to happen automatically.
Of course, it's possible that you can make that code very unobtrusive using macros. It might be problematic for C source files that don't have an "entrypoint", so if your code isn't already organised as "modules", with e.g. an init() function for each module, it might be hard. Static initializing code might be possible, I'm not 100% sure if the order in which things are initialized creates problems here.
Using static storage in the registry module sounds like an excellent idea, a plain linked list or simple hash table should be easy enough to implement, if your project doesn't already include any general-purpose utility library.
In C++ your solution will work. It's guaranteed.
Edit: Just found out a solution in my head: Change a rule in your makefile to add
'-include "cfiles_register.h"' to each 'g++ file.cpp'.
%.o : %.cpp
$(CC) -include 'cfiles_register.h' -o $# $<
put your proposed in the question implemnatation to that 'cfiles_register.h'.
Using static instances in C++ would work fine.
You could do this also in C, but you need to use runtime specific features - for MSVC CRT take a look at http://www.codeguru.com/cpp/misc/misc/threadsprocesses/article.php/c6945/
For C - you could do it with a macro - define a variable named corresponding to your file, and then you could scan the symbols of your executable, just as an idea:
#define TRACK_FILE(name) char _file_tracker_##name;
use it in your my_c_file.c like this:
TRACK_FILE(my_c_file_c)
and than grep all file/variable names from the binary like this
nm my-binary | grep _file_tracker
Not really nice, but...
Horrible idea, I'm sure, but use a singleton. And on each file do something like
Singleton.register(__FILE__);
at global scope. It'll only work on cpp files though.
I did something like this years ago as a novice, and it worked. But I'd cringe to do it now. I'd add a build step now.
I agree with those who say that it is better to avoid doing this at run time, but in C, you can initialize a static variable with a function call, that is, in every file:
static int doesntmatter = register( __FILE__);

Tools for finding unused function declarations?

Whilst refactoring some old code I realised that a particular header file was full of function declarations for functions long since removed from the .cpp file. Does anyone know of a tool that could find (and strip) these automatically?
You could if possible make a test.cpp file to call them all, the linker will flag the ones that have no code as unresolved, this way your test code only need compile and not worry about actually running.
PC-lint can be tunned for dedicated purpose:
I tested the following code against for your question:
void foo(int );
int main()
{
return 0;
}
lint.bat test_unused.cpp
and got the following result:
============================================================
--- Module: test_unused.cpp (C++)
--- Wrap-up for Module: test_unused.cpp
Info 752: local declarator 'foo(int)' (line 2, file test_unused.cpp) not referenced
test_unused.cpp(2) : Info 830: Location cited in prior message
============================================================
So you can pass the warning number 752 for your puropse:
lint.bat -"e*" +e752 test_unused.cpp
-e"*" will remove all the warnings and +e752 will turn on this specific one
If you index to code with Doxygen you can see from where is each function referenced. However, you would have to browse through each class (1 HTML page per class) and scan for those that don't have anything pointing to them.
Alternatively, you could use ctags to generate list of all functions in the code, and then use objdump or some similar tool to get list of all function in .o files - and then compare those lists. However, this can be problematic due to name mangling.
I don't think there is such thing because some functions not having a body in the actual source tree might be defined in some external library. This can only be done by creating a script which makes a list of declared functions in a header and verifies if they are sometimes called.
I have a C++ ftplugin for vim that is able is check and report unmatched functions -- vimmers, the ftplugin suite is not yet straightforward to install. The ftplugin is based on ctags results (hence its heuristic could be easily adapted to other environments), sometimes there are false positives in the case of inline functions.
HTH,
In addition Doxygen (#Milan Babuskov), you can see if there are warnings for this in your compiler. E.g. gcc has -Wunused-function for static functions; -fdump-ipa-cgraph.
I've heard good things about PC-Lint, but I imagine it's probably overkill for your needs.