How would I use a regular expression to find calls to functions? - regex

Suppose I want find all function calls in my listing (a vb.net listing), and I have the function name.
first I thought I could do a regular expression such as:
myfunc\( .* \)
That should work even if the function spans multiple lines, assuming that the dot is interpreted as including newlines (there is an option to do this in dot-net)
but then I realized that some of my arguments themselves could be function calls.
in other words:
myfunc( a,b,c,d(),e ),
which means that the parentheses don't match up.
so I thought that since the main function call usually is the first item on a line, I could do this:
^myfunc( .* \) $
The idea is that the function is the first item on a line (^) and the last paren is the last item on a line ($). but that doesn't work either.
What am I doing wrong?

You can't. By design, regular expressions cannot deal with recursion which is needed here.
For more information, you might want to read the first answer here: Can regular expressions be used to match nested patterns?
And yes, I know that some special "regular expressions" do allow for recursion. However, in most cases, this means that you are doing something horrible. It is much better to use something that can actually understand the syntax of your language.

This is not a direct answer to your question, but if you want to find all uses of your function you can use Visual Studio. Just right click on the function, and then select Find All References:
Visual Studio will show you the results. You can then double click on each line and Visual Studio will take you there.

Related

How to find move constructors in codebase using grep or an IDE?

I want to find move constructors in codebase of a large c++ project. Simply grepping for "&&" doesn't work, because it matches a lot of 'logical and' operators.
Any way to grep more precisely for move constructors?
Any way to search for move constructors using Visual Studio (on Windows) or XCode (on Mac)?
Looks like a job for regular expressions. In Visual Studio, press ctrl+F. In the search pop-up window, activate the option "Use regular expressions" (or press alt+E, when the search line is active). Then type the following expression:
\b\w+\s*[\(]\s*(const)*\s*(volatile)*\s*\w+\s*[&][&]\s*[\)]
It will find any string of the form:
class_name(class_name&&)
class_name(const class_name&&)
class_name(volatile class_name&&)
class_name(const volatile class_name&&)
as specified in:
http://en.cppreference.com/w/cpp/language/move_constructor
The expression works also if there is any number of whitespaces between class_name, parentheses and keywords like const etc.
If you want it to work also for named variables, e.g.:
class_name(class_name&& variable_name)
it's enough to modify it slightly:
\b(\w+)\s*[\(]\s*(const)*\s*(volatile)*\s*\1\s*[&][&]\s*\w*\s*[\)]
EDIT: As an answer to the OP's request, I've modified the above regex in such a way that it uses backreference now. The '\1' means "Find the same expression that was captured by the first expression grouped in parentheses" - the first such expression is (\w+), which is the first 'class_name' in the examplary move constructors above. This ensures that there is the same string on both sides of this guy: '('. To sum it up: one additional pair of parentheses, '\1' and magic happens.
Interesting thing is that Microsoft doesn't mention that VS supports backreferences.
More information about regular expressions in Visual Studio can be found here:
http://msdn.microsoft.com/en-us/library/2k3te2cs.aspx

Regular Expression Ends

I have scoured the web in the past few hours trying to figure out why in the world one of my colleagues insists on using (?!.) as a last-character in his regular expressions instead of the usual $.
Some of the regular expressions I've seen have been ^.*.txt(?!.) which begin with the usual ^, but do not end with the $. I have not been able to find any definitive or time-efficient reasons, any pros and cons or differences at all?
$ may match end of line rather than end of input (this depends on modifiers used). Perhaps this is the reason.
In my opinion, the best way to match the end of input is \z - which means exactly end of input, regardless of modifiers. It is supported in most (if not all) regex implementations.
The only possible difference is with multiline
asdf$ :
http://rubular.com/r/B2cNEL1pln
asdf(?!.) :
http://rubular.com/r/rbhKi1lKGI
^.*\.txt(?!.) means match (beginning)(anything 0 or more times).txt and is not followed by anything.
You can get more info on the ?! pattern here.
If you look here, it says that using the m or s modifiers, you can modify the behavior of ^ and $, to match beginning or end of line, rather than the whole string. There's also an ms. So, I guess with (?!.), you can match the end of the entire multi-line string.
So, I wouldn't say using this is better. Rather, I would say you need to know exactly what you're looking for or what you actually intend to do, within a single-lined string, or multi-lined string and how you want to parse your input to get one-line or multi-line strings, before passing into the regexp.
I think many of us run regexps on single-lined strings and therefore do not feel a difference between the two syntaxes.

How to create regular expression to get all functions from code

I have some problem with my regular expression. I need to find all functions in text. I have this regular expression \w*\([^(]*\). It works fine until text does not contais brackets without function name. For example for this string 'hello world () testFunction()' it returns () and testFunction(), but I need only testFunction(). I want to use it in my c# application to parse passed to my method string. Can anybody help me?
Thanks!
Programming languages have a hierarchical structure, which means that they cannot be parsed by simple regular expressions in the general case. If you want to write correct code that always works, you need to use an LR-parser. If you simply want to apply a hack that will pick up most functions, use something like:
\w+\([^)]*\)
But keep in mind that this will fail in some cases. E.g. it cannot differentiate between a function definition (signature) and a function call, because it does not look at the context.
Try \w+\([^(]*\)
Here I have changed \w* to \w+. This means that the match will need to contain atleast one text character.
Hope that helps
Change the * to + (if it exists in your regex implementation, otherwise do \w\w*). This will ensure that \w is matched one or more times (rather than the zero or more that you currently have).
It largely depends on the definition of "function name". For example, based on your description you only want to filter out the "empty"names, and not want to find all valid names.
If your current solution is largely enough, and you have problems with this empty names, then try to change the * to a +, requiring at least one word character right before the bracket.
\w+([^(]*)
OR
\w\w*([^(]*)
Depending on your regexp application's syntax.
(\w+)\(
regex groups would have the names of variables without any parentesis, you can add them later if you want, i supposed you don't need the parameters.
If you do need the parameters then use:
\w+\(.*\)
for a greedy regex (it would match nested functions calls)
or...
\w+\([^)]*\)
for a non-greedy regex (doesn't match nested function calls, will match only the inner one)

Search and replace with regular expressions under Visual Studio 2003

I have a large C++ code base that I'm doing some refactoring on where a number of functions have become redundant, and hence should be removed. So I would like to replace
MyFunc(Param)
with
Param
where Param could be a literal value, variable, function call etc... From the online help I gathered that the search parameters should be
MyFunc/({+}/(
and the replace parameters simply
/1
But this gives me a syntax error in my pattern. I'm new to search and replace with regex under visual studio. Can the above be easily achieved? I've had a look at similar questions on this site, which suggest I'm roughly on the right track, but seem to be missing something.
Edit: If you can answer the above, how about if it is part of a class deference, e.g.
MyClass.MyFunc(Param)
or
MyClass->MyFunc(Param)
(FWIW, I also picked up a copy of VisualAssist in the hope it could do this but it doesn't appear to be able to handle this situation).
Second edit: Thanks to Joe for the correct response, but for anyone else using this approach, beware of some pitfalls,
MyFunc(MyArray[MyOtherFunc(x)])
ends up as
MyArray[MyOtherFunc(x])
and
MyFunc((SomeType)x)
ends up as
(SomeTypex)
Once you do a search to check what you get prior to doing a search and replace, make sure you keep modified files open in case you need to undo, and backup your source files before starting, this works well enough. Even with the pitfalls listed, still a huge time saver.
Try this instead:
Find = MyFunc\({[^\)]*}\)
Replace = \1
Your slashes are the wrong way around and the expression in the parenthesis ({+}) is invalid.
This won't work for parameters that contain function calls or other uses of parentheses - the balanced bracket matching problem isn't solveable using regular expressions.

Is stringing together multiple regular expressions with "or" safe?

We have a configuration file that lists a series of regular expressions used to exclude files for a tool we are building (it scans .class files). The developer has appended all of the individual regular expressions into a single one using the OR "|" operator like this:
rx1|rx2|rx3|rx4
My gut reaction is that there will be an expression that will screw this up and give us the wrong answer. He claims no; they are ORed together. I cannot come up with case to break this but still fee uneasy about the implementation.
Is this safe to do?
Not only is it safe, it's likely to yield better performance than separate regex matching.
Take the individual regex patterns and test them. If they work as expected then OR them together and each one will still get matched. Thus, you've increased the coverage using one regex rather than multiple regex patterns that have to be matched individually.
As long as they are valid regexes, it should be safe. Unclosed parentheses, brackets, braces, etc would be a problem. You could try to parse each piece before adding it to the main regex to verify they are complete.
Also, some engines have escapes that can toggle regex flags within the expression (like case sensitivity). I don't have enough experience to say if this carries over into the second part of the OR or not. Being a state machine, I'd think it wouldn't.
It's as safe as anything else in regular expressions!
As far as regexes go , Google code search provides regexes for searches so ... it's possible to have safe regexes
I don't see any possible problem too.
I guess by saying 'Safe' you mean that it will match as you needed (because I've never heard of RegEx security hole). Safe or not, we can't tell from this. You need to give us more detail like what the full regex is. Do you wrap it with group and allow multiple? Do you wrap it with start and end anchor?
If you want to match a few class file name make sure you use start and end anchor to be sure the matching is done from start til end. Like this "^(file1|file2)\.class$". Without start and end anchor, you may end up matching 'my_file1.class too'
The answer is that yes this is safe, and the reason why this is safe is that the '|' has the lowest precedence in regular expressions.
That is:
regexpa|regexpb|regexpc
is equivalent to
(regexpa)|(regexpb)|(regexpc)
with the obvious exception that the second would end up with positional matches whereas the first would not, however the two would match exactly the same input. Or to put it another way, using the Java parlance:
String.matches("regexpa|regexpb|regexpc");
is equivalent to
String.matches("regexpa") | String.matches("regexpb") | String.matches("regexpc");