Changing a naming scheme in Eclipse - regex

Is there a way to change variables' naming conventions in Eclipse (specifically Eclipse CDT)? For example, can I do a search-and-replace of variables with names like need_foo and change that to NeedFoo?
Adding and removing underscores is easy, obviously, but I don't see a way to change case. Perl's regexes have \u and \l modifiers to uppercase and lowercase characters, but Eclipse's apparently don't.

There's no automated way of mass-renaming multiple non conforming function names in CDT. There is, however, a Code Analysis rule, which is designed to point out these sort of things. It uses Eclipse Regex described in their online help. These will give you an "Information" level marker in your Problems View.
The way they work is matching a regex against each function name, and raising an error/warning/information if that doesn't return a match. You can access them via "Window->Preferences->C/C++->Code Analysis". It's about half way down the scroll list (in Eclipse Indigo).
To directly answer your second paragraph, Eclipse does not have an equivalent of Perl's \u and \l, the closest it has is (?ismd-ismd), which allows you to turn on matching based on case.
Depending on if the Code Analysis tool returns 5 or 50,000 errors:
Only a few function definitions to rename
You can use the standard refactoring renaming tool. Right click the function name, "Refactor->Rename" will replace all references to that function with your new function name (respecting different scopes).
Many many errors
This is... not as nice. Seeing as there's no built in method, you need to do it externally. The first approach I'd take is to see if there's an existing plugin out there that would do this.
Failing that, what you could perhaps do is use the Code Analysis tool to identify non-conforming function names, and then using the output from that as input to a custom Perl script? You could do it in a few stages:
void FooBar(void) {}
int main(int argc, char *argv[])
{
FooBar();
return 0;
}
1) Run the code analysis, and copy and paste the warnings into a text file:
Bad function name "FooBar" (pattern /^(?-i:([a-z]+_[A-Z])|[a-z])(?i:[a-z]*)\z/) main.c /TestMultiThread line 47 Code Analysis Problem
2) Change the function definition, fooBar() --> FooBar(), using the above errors as input for a perl script (note you have the badly-conforming-function name, the file name and line number).
3) Compile it and then use the output from the compiler's undefined reference to fooBar() to rename any references:
undefined reference to `FooBar' main.c /TestMultiThread line 50 C/C++ Problem
This method would have some short comings, such as the compiler giving a partial list of undefined references due to the compilation terminating 'early', in which case you'd want to run it multiple times.
Another thing to look at is Refactor Scripts (Refactor --> Apply Script), but from the little I've seen of that, I don't think it's going to do what you want.
All in all, I've found the refactoring tools in Eclipse CDT to be no where near as powerful as those for Java (from what I remember). Still better than those in Notepad though (also, not bashing CDT, it's an awesome development environment!)

Related

sed/regex - Updating source files with scope issues

I have a project originally written for Windows, and I am currently in the process of porting it over to Linux. Most of the platform specific code has been #ifdef'ed or wrapped, so it's been easy so far.
This project has about 2000 instances of gettext() scattered throughout about 200 source files (.cpp and .c compiled as C++). The intended function call is:
std::string boost::locale::gettext(const char*);
This works in Windows, but in Linux builds, it resolves to:
char * gettext (const char * msgid);
Which I assume it's resolving from <libintl.h>, which is interesting, since I'm not including it.
What I need to do is to do the following:
Find in all my source files (ignoring the .svn directories):
1.1. Lines containing gettext(.*).c_str() and modify them to become boost::locale::gettext(.*).c_str().
1.2. Lines containing gettext(.*) and modify them to become boost::locale::gettext(.*).c_str().
What's the best way to accomplish this, preferably using BASh and sed, or some command-line-fu in general? The requirements for 1.1 I could probably do easily enough, but 1.2 is a bit more complex, and I'm not sure how to have it know which right parentheses ) to append .c_str() to correctly.
Thank you.
This problem is not solvable with a regex in the general case, since you cannot find the matching closing parenthesis of the gettext()-call with it if other calls are nested in its argument list.
But if usually no nested calls are made, it might be an option to just fix these cases automatically and do the rest by hand.
This sed expression
sed -r "s/gettext\(([^()]*)\)(\.c_str\(\))?/boost::locale::gettext(\1).c_str()/g"
should leave invocations with nested calls untouched and replace the rest.

Tools to refactor names of types, functions and variables?

struct Foo{
Bar get(){
}
}
auto f = Foo();
f.get();
For example you decide that get was a very poor choice for a name but you have already used it in many different files and manually changing ever occurrence is very annoying.
You also can't really make a global substitution because other types may also have a method called get.
Is there anything for D to help refactor names for types, functions, variables etc?
Here's how I do it:
Change the name in the definition
Recompile
Go to the first error line reported and replace old with new
Goto 2
That's semi-manual, but I find it to be pretty easy and it goes quickly because the compiler error message will bring you right to where you need to be, and most editors can read those error messages well enough to dump you on the correct line, then it is a simple matter of telling it to repeat the last replacement again. (In my vim setup with my hotkeys, I hit F4 for next error message, then dot for repeat last change until it is done. Even a function with a hundred uses can be changed reliably* in a couple minutes.)
You could probably write a script that handles 90% of cases automatically too by just looking for ": Error: " in the compiler's output, extracting the file/line number, and running a plain text replace there. If the word shows up only once and outside a string literal, you can automatically replace it, and if not, ask the user to handle the remaining 10% of cases manually.
But I think it is easy enough to do with my editor hotkeys that I've never bothered trying to script it.
The one case this doesn't catch is if there's another function with the same name that might still compile. That should never happen if you do this change in isolation, because an ambiguous name wouldn't compile without it.
In that case, you could probably do a three-step compiler-assisted change:
Make sure your code compiles before. Then add #disable to the thing you want to rename.
Compile. Every place it complains about it being unusable for being disabled, do the find/replace.
Remove #disable and rename the definition. Recompile again to make sure there's nothing you missed like child classes (the compiler will then complain "method foo does not override any function" so they stand right out too.
So yeah, it isn't fully automated, but just changing it and having the compiler errors help find what's left is good enough for me.
Some limited refactoring support can be found in major IDE plugins like Mono-D or VisualD. I remember that Brian Schott had plans to add similar functionality to his dfix tool by adding dependency on dsymbol but it doesn't seem implemented yet.
Not, however, that all such options are indeed of a very limited robustness right now. This is because figuring out the fully qualified name of any given symbol is very complex task in D, one that requires full semantics analysis to be done 100% correctly. Think about local imports, templates, function overloading, mixins and how it all affects identifying the symbol.
In the long run it is quite certain that we need to wait before reference D compiler frontend becomes available as a library to implement such refactoring tool in clean and truly reliable way.
A good find all feature can be better than a bad refactoring which, as mentioned previously, requires semantic.
Personally I have a find all feature in Coedit which displays the context of a match and works on all the project sources.
It's fast to process the results.

How to modify C++ code from user-input

I am currently writing a program that sits on top of a C++ interpreter. The user inputs C++ commands at runtime, which are then passed into the interpreter. For certain patterns, I want to replace the command given with a modified form, so that I can provide additional functionality.
I want to replace anything of the form
A->Draw(B1, B2)
with
MyFunc(A, B1, B2).
My first thought was regular expressions, but that would be rather error-prone, as any of A, B1, or B2 could be arbitrary C++ expressions. As these expressions could themselves contain quoted strings or parentheses, it would be quite difficult to match all cases with a regular expression. In addition, there may be multiple, nested forms of this expression
My next thought was to call clang as a subprocess, use "-dump-ast" to get the abstract syntax tree, modify that, then rebuild it into a command to be passed to the C++ interpreter. However, this would require keeping track of any environment changes, such as include files and forward declarations, in order to give clang enough information to parse the expression. As the interpreter does not expose this information, this seems infeasible as well.
The third thought was to use the C++ interpreter's own internal parsing to convert to an abstract syntax tree, then build from there. However, this interpreter does not expose the ast in any way that I was able to find.
Are there any suggestions as to how to proceed, either along one of the stated routes, or along a different route entirely?
What you want is a Program Transformation System.
These are tools that generally let you express changes to source code, written in source level patterns that essentially say:
if you see *this*, replace it by *that*
but operating on Abstract Syntax Trees so the matching and replacement process is
far more trustworthy than what you get with string hacking.
Such tools have to have parsers for the source language of interest.
The source language being C++ makes this fairly difficult.
Clang sort of qualifies; after all it can parse C++. OP objects
it cannot do so without all the environment context. To the extent
that OP is typing (well-formed) program fragments (statements, etc,.)
into the interpreter, Clang may [I don't have much experience with it
myself] have trouble getting focused on what the fragment is (statement? expression? declaration? ...). Finally, Clang isn't really a PTS; its tree modification procedures are not source-to-source transforms. That matters for convenience but might not stop OP from using it; surface syntax rewrite rule are convenient but you can always substitute procedural tree hacking with more effort. When there are more than a few rules, this starts to matter a lot.
GCC with Melt sort of qualifies in the same way that Clang does.
I'm under the impression that Melt makes GCC at best a bit less
intolerable for this kind of work. YMMV.
Our DMS Software Reengineering Toolkit with its full C++14 [EDIT July 2018: C++17] front end absolutely qualifies. DMS has been used to carry out massive transformations
on large scale C++ code bases.
DMS can parse arbitrary (well-formed) fragments of C++ without being told in advance what the syntax category is, and return an AST of the proper grammar nonterminal type, using its pattern-parsing machinery. [You may end up with multiple parses, e.g. ambiguities, that you'll have decide how to resolve, see Why can't C++ be parsed with a LR(1) parser? for more discussion] It can do this without resorting to "the environment" if you are willing to live without macro expansion while parsing, and insist the preprocessor directives (they get parsed too) are nicely structured with respect to the code fragment (#if foo{#endif not allowed) but that's unlikely a real problem for interactively entered code fragments.
DMS then offers a complete procedural AST library for manipulating the parsed trees (search, inspect, modify, build, replace) and can then regenerate surface source code from the modified tree, giving OP text
to feed to the interpreter.
Where it shines in this case is OP can likely write most of his modifications directly as source-to-source syntax rules. For his
example, he can provide DMS with a rewrite rule (untested but pretty close to right):
rule replace_Draw(A:primary,B1:expression,B2:expression):
primary->primary
"\A->Draw(\B1, \B2)" -- pattern
rewrites to
"MyFunc(\A, \B1, \B2)"; -- replacement
and DMS will take any parsed AST containing the left hand side "...Draw..." pattern and replace that subtree with the right hand side, after substituting the matches for A, B1 and B2. The quote marks are metaquotes and are used to distinguish C++ text from rule-syntax text; the backslash is a metaescape used inside metaquotes to name metavariables. For more details of what you can say in the rule syntax, see DMS Rewrite Rules.
If OP provides a set of such rules, DMS can be asked to apply the entire set.
So I think this would work just fine for OP. It is a rather heavyweight mechanism to "add" to the package he wants to provide to a 3rd party; DMS and its C++ front end are hardly "small" programs. But then modern machines have lots of resources so I think its a question of how badly does OP need to do this.
Try modify the headers to supress the method, then compiling you'll find the errors and will be able to replace all core.
As far as you have a C++ interpreter (as CERN's Root) I guess you must use the compiler to intercept all the Draw, an easy and clean way to do that is declare in the headers the Draw method as private, using some defines
class ItemWithDrawMehtod
{
....
public:
#ifdef CATCHTHEMETHOD
private:
#endif
void Draw(A,B);
#ifdef CATCHTHEMETHOD
public:
#endif
....
};
Then compile as:
gcc -DCATCHTHEMETHOD=1 yourfilein.cpp
In case, user want to input complex algorithms to the application, what I suggest is to integrate a scripting language to the app. So that the user can write code [function/algorithm in defined way] so the app can execute it in the interpreter and get the final results. Ex: Python, Perl, JS, etc.
Since you need C++ in the interpreter http://chaiscript.com/ would be a suggestion.
What happens when someone gets ahold of the Draw member function (auto draw = &A::Draw;) and then starts using draw? Presumably you'd want the same improved Draw-functionality to be called in this case too. Thus I think we can conclude that what you really want is to replace the Draw member function with a function of your own.
Since it seems you are not in a position to modify the class containing Draw directly, a solution could be to derive your own class from A and override Draw in there. Then your problem reduces to having your users use your new improved class.
You may again consider the problem of automatically translating uses of class A to your new derived class, but this still seems pretty difficult without the help of a full C++ implementation. Perhaps there is a way to hide the old definition of A and present your replacement under that name instead, via clever use of header files, but I cannot determine whether that's the case from what you've told us.
Another possibility might be to use some dynamic linker hackery using LD_PRELOAD to replace the function Draw that gets called at runtime.
There may be a way to accomplish this mostly with regular expressions.
Since anything that appears after Draw( is already formatted correctly as parameters, you don't need to fully parse them for the purpose you have outlined.
Fundamentally, the part that matters is the "SYMBOL->Draw("
SYMBOL could be any expression that resolves to an object that overloads -> or to a pointer of a type that implements Draw(...). If you reduce this to two cases, you can short-cut the parsing.
For the first case, a simple regular expression that searches for any valid C++ symbol, something similar to "[A-Za-z_][A-Za-z0-9_\.]", along with the literal expression "->Draw(". This will give you the portion that must be rewritten, since the code following this part is already formatted as valid C++ parameters.
The second case is for complex expressions that return an overloaded object or pointer. This requires a bit more effort, but a short parsing routine to walk backward through just a complex expression can be written surprisingly easily, since you don't have to support blocks (blocks in C++ cannot return objects, since lambda definitions do not call the lambda themselves, and actual nested code blocks {...} can't return anything directly inline that would apply here). Note that if the expression doesn't end in ) then it has to be a valid symbol in this context, so if you find a ) just match nested ) with ( and extract the symbol preceding the nested SYMBOL(...(...)...)->Draw() pattern. This may be possible with regular expressions, but should be fairly easy in normal code as well.
As soon as you have the symbol or expression, the replacement is trivial, going from
SYMBOL->Draw(...
to
YourFunction(SYMBOL, ...
without having to deal with the additional parameters to Draw().
As an added benefit, chained function calls are parsed for free with this model, since you can recursively iterate over the code such as
A->Draw(B...)->Draw(C...)
The first iteration identifies the first A->Draw( and rewrites the whole statement as
YourFunction(A, B...)->Draw(C...)
which then identifies the second ->Draw with an expression "YourFunction(A, ...)->" preceding it, and rewrites it as
YourFunction(YourFunction(A, B...), C...)
where B... and C... are well-formed C++ parameters, including nested calls.
Without knowing the C++ version that your interpreter supports, or the kind of code you will be rewriting, I really can't provide any sample code that is likely to be worthwhile.
One way is to load user code as a DLL, (something like plugins,)
this way, you don't need to compile your actual application, just the user code will be compiled, and you application will load it dynamically.

Is there a tool that enables me to insert one line of code into all functions and methods in a C++-source file?

It should turn this
int Yada (int yada)
{
return yada;
}
into this
int Yada (int yada)
{
SOME_HEIDEGGER_QUOTE;
return yada;
}
but for all (or at least a big bunch of) syntactically legal C/C++ - function and method constructs.
Maybe you've heard of some Perl library that will allow me to perform these kinds of operations in a view lines of code.
My goal is to add a tracer to an old, but big C++ project in order to be able to debug it without a debugger.
Try Aspect C++ (www.aspectc.org). You can define an Aspect that will pick up every method execution.
In fact, the quickstart has pretty much exactly what you are after defined as an example:
http://www.aspectc.org/fileadmin/documentation/ac-quickref.pdf
If you build using GCC and the -pg flag, GCC will automatically issue a call to the mcount() function at the start of every function. In this function you can then inspect the return address to figure out where you were called from. This approach is used by the linux kernel function tracer (CONFIG_FUNCTION_TRACER). Note that this function should be written in assembler, and be careful to preserve all registers!
Also, note that this should be passed only in the build phase, not link, or GCC will add in the profiling libraries that normally implement mcount.
I would suggest using the gcc flag "-finstrument-functions". Basically, it automatically calls a specific function ("__cyg_profile_func_enter") upon entry to each function, and another function is called ("__cyg_profile_func_exit") upon exit of the function. Each function is passed a pointer to the function being entered/exited, and the function which called that one.
You can turn instrumenting off on a per-function or per-file basis... see the docs for details.
The feature goes back at least as far as version 3.0.4 (from February 2002).
This is intended to support profiling, but it does not appear to have side effects like -pg does (which compiles code suitable for profiling).
This could work quite well for your problem (tracing execution of a large program), but, unfortunately, it isn't as general purpose as it would have been if you could specify a macro. On the plus side, you don't need to worry about remembering to add your new code into the beginning of all new functions that are written.
There is no such tool that I am aware of. In order to recognise the correct insertion point, the tool would have to include a complete C++ parser - regular expressions are not enough to accomplish this.
But as there are a number of FOSS C++ parsers out there, such a tool could certainly be written - a sort of intelligent sed for C++ code. The biggest problem would probably be designing the specification language for the insert/update/delete operation - regexes are obviously not the answer, though they should certainly be included in the language somehow.
People are always asking here for ideas for projects - how about this for one?
I use this regex,
"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"
to locate the functions and add extra lines of code.
With that regex I also get the function name (group 1) and the arguments (group 2).
Note: you must filter out names like, "while", "do", "for", "switch".
This can be easily done with a program transformation system.
The DMS Software Reengineering Toolkit is a general purpose program transformation system, and can be used with many languages (C#, COBOL, Java, EcmaScript, Fortran, ..) as well as specifically with C++.
DMS parses source code (using full langauge front end, in this case for C++),
builds Abstract Syntax Trees, and allows you to apply source-to-source patterns to transform your code from one C# program into another with whatever properties you wish. THe transformation rule to accomplish exactly the task you specified would be:
domain CSharp.
insert_trace():function->function
"\visibility \returntype \fnname(int \parametername)
{ \body } "
->
"\visibility \returntype \fnname(int \parametername)
{ Heidigger(\CppString\(\methodname\),
\CppString\(\parametername\),
\parametername);
\body } "
The quote marks (") are not C++ quote marks; rather, they are "domain quotes", and indicate that the content inside the quote marks is C++ syntax (because we said, "domain CSharp"). The \foo notations are meta syntax.
This rule matches the AST representing the function, and rewrites that AST into the traced form. The resulting AST is then prettyprinted back into source form, which you can compile. You probably need other rules to handle other combinations of arguments; in fact, you'd probably generalize the argument processing to produce (where practical) a string value for each scalar argument.
It should be clear you can do a lot more than just logging with this, and a lot more than just aspect-oriented programming, since you can express arbitrary transformations and not just before-after actions.

finding a function name and counting its LOC

So you know off the bat, this is a project I've been assigned. I'm not looking for an answer in code, but more a direction.
What I've been told to do is go through a file and count the actual lines of code while at the same time recording the function names and individual lines of code for the functions. The problem I am having is determining a way when reading from the file to determine if the line is the start of a function.
So far, I can only think of maybe having a string array of data types (int, double, char, etc), search for that in the line and then search for the parenthesis, and then search for the absence of the semicolon (so i know it isn't just the declaration of the function).
So my question is, is this how I should go about this, or are there other methods in which you would recommend?
The code in which I will be counting will be in C++.
Three approaches come to mind.
Use regular expressions. This is fairly similar to what you're thinking of. Look for lines that look like function definitions. This is fairly quick to do, but can go wrong in many ways.
char *s = "int main() {"
is not a function definition, but sure looks like one.
char
* /* eh? */
s
(
int /* comment? // */ a
)
// hello, world /* of confusion
{
is a function definition, but doesn't look like one.
Good: quick to write, can work even in the face of syntax errors; bad: can easily misfire on things that look like (or fail to look like) the "normal" case.
Variant: First run the code through, e.g., GNU indent. This will take care of some (but not all) of the misfires.
Use a proper lexer and parser. This is a much more thorough approach, but you may be able to re-use an open source lexer/parsed (e.g., from gcc).
Good: Will be 100% accurate (will never misfire). Bad: One missing semicolon and it spews errors.
See if your compiler has some debug output that might help. This is a variant of (2), but using your compiler's lexer/parser instead of your own.
Your idea can work in 99% (or more) of the cases. Only a real C++ compiler can do 100%, in which case I'd compile in debug mode (g++ -S prog.cpp), and get the function names and line numbers from the debug information of the assembly output (prog.s).
My thoughts for the 99% solution:
Ignore comments and strings.
Document that you ignore preprocessor directives (#include, #define, #if).
Anything between a toplevel { and } is a function body, except after typedef, class, struct, union, namespace and enum.
If you have a class, struct or union, you should be looking for method bodies inside it.
The function name is sometimes tricky to find, e.g. in long(*)(char) f(int); .
Make sure your parser works with template functions and template classes.
For recording function names I use PCRE and the regex
"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"
and then filter out names like "if", "while", "do", "for", "switch". Note that the function name is (\w+), group 1.
Of course it's not a perfect solution but a good one.
I feel manually doing the parsing is going to be a quite a difficult task. I would probably use a existing tool such as RSM redirect the output to a csv file (assuming you are on windows) and then parse the csv file to gather the required information.
Find a decent SLOC count program, eg, SLOCCounter. Not only can you count SLOC, but you have something against which to compare your results. (Update: here's a long list of them.)
Interestingly, the number of non-comment semicolons in a C/C++ program is a decent SLOC count.
How about writing a shell script to do this? An AWK program perhaps.