Coldfusion regular expression order of precedence - coldfusion

We have a rereplace command like the following:
<cfoutput>
#ReReplaceNoCase("www.one.two.three.four.com","(^www.)|(\..*$)","","ALL")#
</cfoutput>
The idea is to return the string "one", which the function does.
However, it seems ambiguous in that the (\..*$) could technically go all the way to the end of www.
Is there a defined order of precedence that states the "^" operator will be executed before the "&".
Note...reversing order of the substrings in the Regex does not affect the result.

Not to my knowledge. It'll probably do them the normal way, i.e. try the first clause, then the second clause, depending on if it's using lazy evaluation (which it is). Instead you'd be better to modify that second clause to be non-greedy, something like (\..*?$)

Related

Regular Expressions in Google Sheets

I'm trying to use regular expressions within Google Sheets. Given that the environment is within GSheets some functionality seems to be missing or, potentially just different.
I would like to use a regexmatch function that returns true if the range in question contains any of the following strings:
"string1"
"string2"
"string3"
I tried =regexmatch(range,"([Ss]tring1|[[Ss]tring2|[Ss]tring3)"
This works.
But my developer colleague said he would usually just end the expression /i to say "Be case insensitive"
=regexmatch(range,"/(String1|String2|String3)/i"
But since Gsheets does not use "/" to open a regular expression, is there another way to tell the function to ignore case?
Also, is there a way to negate the expression? That is, instead of:
=NOT(regexmatch(range,"([Ss]tring1|[[Ss]tring2|[Ss]tring3)")
Can you do something like
=regexmatch(range,"!=([Ss]tring1|[[Ss]tring2|[Ss]tring3)"
you can try wrapping your range with the "lower" function, so compares the values as if they are all lower case regardless of whether they really are or not.
=REGEXMATCH(lower(range),"string1|string2|string3")
is there another way to tell the function to ignore case?
Please try:
=regexmatch(range,"(?i)string1|string2|string3")

How would I use a regular expression to find calls to functions?

Suppose I want find all function calls in my listing (a vb.net listing), and I have the function name.
first I thought I could do a regular expression such as:
myfunc\( .* \)
That should work even if the function spans multiple lines, assuming that the dot is interpreted as including newlines (there is an option to do this in dot-net)
but then I realized that some of my arguments themselves could be function calls.
in other words:
myfunc( a,b,c,d(),e ),
which means that the parentheses don't match up.
so I thought that since the main function call usually is the first item on a line, I could do this:
^myfunc( .* \) $
The idea is that the function is the first item on a line (^) and the last paren is the last item on a line ($). but that doesn't work either.
What am I doing wrong?
You can't. By design, regular expressions cannot deal with recursion which is needed here.
For more information, you might want to read the first answer here: Can regular expressions be used to match nested patterns?
And yes, I know that some special "regular expressions" do allow for recursion. However, in most cases, this means that you are doing something horrible. It is much better to use something that can actually understand the syntax of your language.
This is not a direct answer to your question, but if you want to find all uses of your function you can use Visual Studio. Just right click on the function, and then select Find All References:
Visual Studio will show you the results. You can then double click on each line and Visual Studio will take you there.

How to create regular expression to get all functions from code

I have some problem with my regular expression. I need to find all functions in text. I have this regular expression \w*\([^(]*\). It works fine until text does not contais brackets without function name. For example for this string 'hello world () testFunction()' it returns () and testFunction(), but I need only testFunction(). I want to use it in my c# application to parse passed to my method string. Can anybody help me?
Thanks!
Programming languages have a hierarchical structure, which means that they cannot be parsed by simple regular expressions in the general case. If you want to write correct code that always works, you need to use an LR-parser. If you simply want to apply a hack that will pick up most functions, use something like:
\w+\([^)]*\)
But keep in mind that this will fail in some cases. E.g. it cannot differentiate between a function definition (signature) and a function call, because it does not look at the context.
Try \w+\([^(]*\)
Here I have changed \w* to \w+. This means that the match will need to contain atleast one text character.
Hope that helps
Change the * to + (if it exists in your regex implementation, otherwise do \w\w*). This will ensure that \w is matched one or more times (rather than the zero or more that you currently have).
It largely depends on the definition of "function name". For example, based on your description you only want to filter out the "empty"names, and not want to find all valid names.
If your current solution is largely enough, and you have problems with this empty names, then try to change the * to a +, requiring at least one word character right before the bracket.
\w+([^(]*)
OR
\w\w*([^(]*)
Depending on your regexp application's syntax.
(\w+)\(
regex groups would have the names of variables without any parentesis, you can add them later if you want, i supposed you don't need the parameters.
If you do need the parameters then use:
\w+\(.*\)
for a greedy regex (it would match nested functions calls)
or...
\w+\([^)]*\)
for a non-greedy regex (doesn't match nested function calls, will match only the inner one)

Finding Elvis ?:

I have been tasked to find Elvis (using eclipse search). Is there any regex that I can use to find him?
The "Elvis Operator" (?:) is a shortening of Java's ternary operator.
I have tried \?[\s\S]*[:] but it doesn't match multiline.
Is there such a refactoring where I could change Elvis into an if-else block?
Edit
Sorry, I had posted a regex for the ternary operator, if your problem is multiline you could use this:
\?(\p{Z}|\r\n|\n)*:
You'll need to explicitly match line delimiters if you want to match across multiple lines. \R will match any of them(platform-independent), in Eclipse 3.4 anyway, or you can use the proper one for your file (\r, \n, \r\n). E.g. \?.*\R*.*: will work if there's only one line break. You can't use \R in a character class, though, so if you don't know how many lines the operator might span, you'd have to construct a character class with your line delimiter and any character that might appear in an operand. Something like ([-\r\n\w\s\[\](){}=!/%*+&^|."']*)\?([-\r\n\w\s\[\](){}=!/%*+&^|."']*):([-\r\n\w\s\[\](){}=!/%*+&^|."']*). I've included parentheses to capture the operands as groups so you could find and replace.
You've got a pretty big problem, though, if this is Java (and probably any other language). The ternary conditional ?: operator creates an expression, while an if statement is not an expression. Consider:
boolean even = true;
int foo = even ? 2 : 3;
int bar = if (even) 2 else 3;
The third line is syntactically incorrect; the two conditional constructs are not equivalent. (What you'd actually get from the second line if you used my regex to find and replace is if (int foo = even) 2 else 3; which has additional problems.)
So, you can find the ?: operators with the regex above (or something similar; I may have missed some characters you need to include in the class), but you won't necessarily be able to replace them with 'if' statements.

Is stringing together multiple regular expressions with "or" safe?

We have a configuration file that lists a series of regular expressions used to exclude files for a tool we are building (it scans .class files). The developer has appended all of the individual regular expressions into a single one using the OR "|" operator like this:
rx1|rx2|rx3|rx4
My gut reaction is that there will be an expression that will screw this up and give us the wrong answer. He claims no; they are ORed together. I cannot come up with case to break this but still fee uneasy about the implementation.
Is this safe to do?
Not only is it safe, it's likely to yield better performance than separate regex matching.
Take the individual regex patterns and test them. If they work as expected then OR them together and each one will still get matched. Thus, you've increased the coverage using one regex rather than multiple regex patterns that have to be matched individually.
As long as they are valid regexes, it should be safe. Unclosed parentheses, brackets, braces, etc would be a problem. You could try to parse each piece before adding it to the main regex to verify they are complete.
Also, some engines have escapes that can toggle regex flags within the expression (like case sensitivity). I don't have enough experience to say if this carries over into the second part of the OR or not. Being a state machine, I'd think it wouldn't.
It's as safe as anything else in regular expressions!
As far as regexes go , Google code search provides regexes for searches so ... it's possible to have safe regexes
I don't see any possible problem too.
I guess by saying 'Safe' you mean that it will match as you needed (because I've never heard of RegEx security hole). Safe or not, we can't tell from this. You need to give us more detail like what the full regex is. Do you wrap it with group and allow multiple? Do you wrap it with start and end anchor?
If you want to match a few class file name make sure you use start and end anchor to be sure the matching is done from start til end. Like this "^(file1|file2)\.class$". Without start and end anchor, you may end up matching 'my_file1.class too'
The answer is that yes this is safe, and the reason why this is safe is that the '|' has the lowest precedence in regular expressions.
That is:
regexpa|regexpb|regexpc
is equivalent to
(regexpa)|(regexpb)|(regexpc)
with the obvious exception that the second would end up with positional matches whereas the first would not, however the two would match exactly the same input. Or to put it another way, using the Java parlance:
String.matches("regexpa|regexpb|regexpc");
is equivalent to
String.matches("regexpa") | String.matches("regexpb") | String.matches("regexpc");