How to use LookBehind in this case? I'm lost - regex

That's my string:
myclass.test() and(myclass.mytest() and myclass.test("argument")) or (myclass.mytests(1))
I am trying to capture only the openings of parentheses "(" that is not part of a function,
So I tried to start capturing the functions (and then deny this rule):
\w*\.\w[^(]*\(
Perfect, i catch only the functions, when I tried to use the following expression I did not succeed (why?)
(?<=(\w*\.\w[^(]*\())\(
Notes:
- myclass. never changes
- don't forget the "and("
- (?<=t)( < works fine.
Thanks :)
Temporary Solution
I will continue studying and trying to apply the "lookbehind" for this case, it seems an interesting approach, but our friend #hwnd suggested a different approach that applies in my case:
\((?=myclass)
Thank u guys.

I am a bit confused on what is part of a function or not here.
To match the following myclass.test( parts you could just do.
[a-zA-Z]+\.[a-zA-Z]+\(
Both of these will match the open parentheses that is not part of the myclass. function.
Positive Lookahead
\((?=[^)])
Regular expression:
\( '('
(?= look ahead to see if there is:
[^)] any character except: ')'
) end of look-ahead
Negative Lookahead
\((?!\))
Regular expression:
\( '('
(?! look ahead to see if there is not:
\) ')'
) end of look-ahead
See live demo
You could possibly even use a Negative Lookbehind here.
(?<!\.)\((?!\))

Since you can't use variable-length lookbehind in Python, you will need to do some of the job outside regex. One possible way is to capture two groups, the first one will capture the class.function part if it exists, the second one will capture the open parenthesis. So you can just take those parenthesis for which the first group has no match.
In this case, we check whether the match length is one character (i.e., only the opening parenthesis), then we print the matching index. You can print the matching string also, which would always be an open parenthesis =D
import re
text = 'myclass.test() and(myclass.mytest() and myclass.test("argument")) or (myclass.mytests(1))'
for result in re.finditer(r'(\w+\.\w[^(]*\()?\(',text):
if result.end()-result.start()==1:
print result.span(), result.string
Result:
(18,19)
(69,70)

Related

Regular expression to find and replace ")" in anything matching "pc( * )"?

I'm trying to learn regular expressions to speed up editing my program.
My program has hundreds of references to the 3-dimensional array pc. For example, the array elements might be referred to as pc(i+1,j+1,k), pc(i,j+1,k-1) or pc(i,j,k). I need a regular expression to search for the ending parenthesis so that I can replace it with ",1)". For example, the end goal is to convert pc(i,j,k) to pc(i,j,k,1).
I don't need the regular expression to do the actual replacing -- I don't even know if that's possible -- I just need it to find the ending parenthesis so I can replace it.
Any help or hints would be much appreciated!
Here's an excerpt of the code I would be searching through:
PpPx_ey = 0.5*( FNy(i,j+1,k) *((pc(i,j+1,k)-pc(i-1,j+1,k))/xdiff(i,j,k)+(pc(i+1,j+1,k)-pc(i,j+1,k))/xdiff(i+1,j,k) )+(1.-FNy(i,j+1,k))*((pc(i,j, k)-pc(i-1,j, k))/xdiff(i,j,k)+(pc(i+1,j ,k)-pc(i,j ,k))/xdiff(i+1,j,k)) ).
To further clarify: I'm using the Atom notepad, which allows for regular expressions in the CTRL-F command. I want to use the 'replace' option for things that I CTRL-F, but I need to use a literal string for that part. Thus if I can find the ending ")" in anything that looks like pc( ) using a regular expression, I can replace it with ",1)".
Pretty simple, actually.
This should do it for you:
pc\(.*\)
pc = literally pc
\( = escaped (
.* = anything
\) = escaped )
(pc\(.*?)\)
( - Begins a capture group.
pc - This will match the literal pc
\( - Matches the opening parenthesis. The backslash escapes the
parenthesis, so that it isn't interpretted as the beginning of a
capture group.
.*? - Will lazily match anything. . will match any single
character. * is a quantifier that matches any number (including
zero) of the preceding element, the . in this case. ? causes the
preceding quantifier to be lazy, meaning that it will match the
minimum number of characters possible. This is what prevents matching
pc(i,j+1,k)-pc(i-1,j+1,k) in the string
(pc(i,j+1,k)-pc(i-1,j+1,k))/xdiff(i,j,k) as one match, rather than
two different matches.
) - Ends the capture group.
\) - Same as \(, but matches a closing brace.
The closing brace can be replaced with ,1) as you mentioned. Everything besides the closing brace is captured. The first capture group is usually referenced in a replace string using $1 or \1. So something like $1,1) should replace the closing brace.
Hope this will help you a bit!
According to your question, it seems that you want to find all patterns that like , k + or - number), thus , k+1), k-1), k) should all be found out and replaced.
I write a regex expression, which should be able to fulfill you, but it's not perfect.
It is like this:
import re
s = 'PpPx_ey = 0.5*( FNy(i,j+1,k) *((pc(i,j+1,k)-pc(i-1,j+1,k))/xdiff(i,j,k)+(pc(i+1,j+1,k)-pc(i,j+1,k))/xdiff(i+1, j,k) )+(1.-FNy(i,j+1,k))*((pc(i,j, k)-pc(i-1,j, k))/xdiff(i,j,k)+(pc(i+1,j ,k)-pc(i,j ,k))/xdiff(i+1,j,k)) )'
print re.findall(',\s*k\s*[\+\-]*\s*\d*\s*\)', s)
com = re.compile(',\s*k\s*[\+\-]*\s*\d*\s*\)')
for i in com.finditer(s):
print i.start(), i.group()
str_replaced = re.sub(',\s*k\s*[\+\-]*\s*\d*\s*\)', ', 1)', s)
print str_replaced
The key regex expression is ,\s*k\s*[\+\-]*\s*\d*\s*\), it is not perfect because it will match string like this: ,k+), this kind of string may not need to be found out or may be not even exist.
The expression ,\s*k\s*[\+\-]*\s*\d*\s*\) means: it will match a string: start with ,, then may or may not have blanks or Tabs, then should have letter k, then blanks or not, then may have +, or - or may not have them at all, then blanks or not, then may have a digit number or not, then blanks or not, then the ending parenthesis ).
Check if this will help you.

How to find a pattern that does not contains a subpattern?

I'm working on a Netbeans project (Java), and I'm checking if everything is allright. I've just discovered that the Find tool in Netbeans supports regular expressions, so I used the following Regex to find the correct code:
ps\.set.*\(3, c0.get\((.*)\)\);(\n.*){3}.*ps\.set.*\(7, c1.get\(t\).get\(\1\)\);\n.*ps\.addBatch\(\);
^^ ^^
Subpattern 1... ...must also appear here
Example (matched by Regex above):
ps.setString(3, c0.get(fooPattern)); // "fooPattern" here...
ps.setDouble(5, s.get(k));
ps.setDouble(6, 0d);
ps.setLong(7, c1.get(t).get(fooPattern)); // ... must also be here
ps.addBatch();
What I need it to check if the subpattern 1 appears in the second position. This Regex works. However, I need to find all occurrences where this does not happen.
Example (what I'd like to find):
ps.setString(3, c0.get(fooPattern)); // "fooPattern" here...
ps.setDouble(5, s.get(k));
ps.setDouble(6, 0d);
ps.setLong(7, c1.get(t).get(barPattern)); // ... is NOT here
ps.addBatch();
So, the specific question is: How to modify this Regex to find all occurrences where Subpattern 1 is not repeated in the position it should be?
Use a negative lookahead as I have already mentioned, just use a consuming subpattern instead of the \1 (that was consuming text in your original regex). Also, do not forget that to match a literal dot, you need to escape it.
ps\.set.*\(3, c0\.get\((.*)\)\);(\n.*){3}.*ps\.set.*\(7, c1\.get\(t\)\.get\((?!\1\))[^()]*\)\);\n.*ps\.addBatch\(\);
^^^^^^^^^^^^
See the regex demo
The (?!\1\)) negative lookahead restricts the more generic [^()]* pattern (that matches zero or more characters other than ( and ), you may replace it with .* if there can be parentheses inside) making it fail if these 0+ chars are equal to the value captured into Group 1.

Trying to match a sequence if not preceded by one group, but yes if preceded by another

This is getting a little meta, but I'm trying to figure out a regex to match regexes for syntax highlighting purposes. There's a nice long backstory, but in the interest of brevity I'll skip it. Here's what I'm trying to do: I need to match a comment (preceded by # and terminated at the end of the line) only if it is not inside a character class ([...]), although it should be matched if there is a complete (closed) character class earlier in the line.
The complicating factor is escaped square brackets — while a plain [ earlier in the line not followed by a closing ] would indicate that we're still in a character class, and therefore illegal, an escaped bracket \[ could be present, with or without the presence of a closing escaped bracket \].
Maybe some examples will help. Here are some instances where a comment should be matched:
(\h{8}-\h{4}-\h{4}-\h{4}-\h{12}) # match UUID (no square brackets at all)
([A-Za-z_][A-Za-z0-9_]*) # valid Python identifier (paired unescaped square brackets)
(\||\[|\?) # match some stuff (escaped opening square bracket)
Here is an example of where an "attempted comment" should not be matched:
[A-Za-z # letters
0-9_-.] # numbers and other characters
(the first line should not be matched, the second one is fine)
I'm by no means a regex master (which is why I'm asking this question!), but I have tried fiddling around with positive and negative lookbehinds, and trying to nest them, but I've had zero luck except with
(?<!\[)((#+).*$)
which matches a comment only if not preceded by an opening square bracket. Once I started nesting the lookarounds, though, and trying to match if the opener was preceded by an escape, I got stumped. Any help would be ... helpful.
It is rather simple, but in works with cases from your example. So try this:
(?<=[\][)]\s)(#(.*))$
DEMO
it match comment only if preceded by closing bracket and space.
EDIT
As I thought you case is much more complicated, so maybe try this one:
^(?=(?:[-\w\d?*.+|{}\\\/\s<>\]]|(?:\\[\[\]()]))+(#+.*)$)|^(?=^[\[(].+?[\])]\s*(#+.*)$)
DEMO
It will match only by groups (it is not matching any text at all, as it use only positive lookahead, but grouping is lookarounds is allowed). Or if you want to match directly, match more text, and then get what you want with groups with something like:
^(?:(?:[-\w\d?*.+|{}\\\/\s<>\]])|(?:\\[\[\]()])|^[\[(].+?[\])])+\s*(#+.*)$
DEMO
However in both cases, you probably would need to add more characters occuring in regular expressions to first alternative (?:[-\w\d?*.+|{}\\\/\s<>\]]). For example, if you want it to match also comment in (\[ # works if escaped [ is in group you need to add ( to alternative. But I am not sure is it what you wanted.
EDIT "invalid scope"
Try with:
^(?:(?:[-\w\d?*.+|{}\\\/\s<>\]\(])|(?:\\[\[\]()])|^[\[(].+?[\])])+\s*(?<valid>(?:#+).*)$|^[-\[\w\d?*.+|{}\\\/\s<>\(]+(?<invalid>(?:#+).*)$
DEMO
Think you mean this,
^\[[^\]]*\].*#.*$|#(.*)$
DEMO
OR
^\[[^\]]*\].*#.*$(*SKIP)(*F)|#.*$

Capturing two groups out of a string with a regex

I don't know anything about regular expressions and I don't really have the time to study them at the moment.
I have a string like this:
test (22/22/22)
I need to capture the test and the date 22/22/22 in an array.
the test string could also be a multiple words string:
test test(1) tes-t (22/22/22)
should capture test test(1) tes-t and 22/22/22
I have no idea how to get started on this. I managed to capture the date string with the parentheses by doing:
(\(.*)
but that really doesn't get me anywhere.
Could someone help me out here and provide an explanation of how I should go about capturing this? I'm kinda lost.
Thanks
To explain the given regular expression : (.*)\(([^)]+)\)
(.*) will match anything, and capture it (the parenthesis capture what their inner expression matches)
\( is an escaped parenthesis. That's what you'll write when you wnat to capture a parenthesis.
[^)]+ means anything but a parenthesis (special characters must not be escaped within square brackets) one or more times.
([^)]+) captures what's explained above
\) matches a closing parenthesis
So this regex will fail and capture the wrong strings if you have, say, a parenthesis in your first words like in :
test test(1) tes-t (22/22/22)
I'd recommend to think about what is the information you want to capture, and how do you spearate it from the rest of your string. This done,it will be much more easier to build an effective regular expression.
Try this
^(.*)\(([^)]*)\)
See it here online on Regexr
While hovering with the mouse over the blue colored matches, you can see the content of the capturing groups.
Explanation
^ BeginOfLine
(.*) CapturingGroup 1 AnyCharacterExcept\n, zero or more times
\(([^)]*)\) ( CapturingGroup 2, AnyCharNotIn[ )] zero or more times
This needle works on your example input:
(.*)\(([^)]+)\)

How is this regex wrong?

I have a regex which I'm using to match user functions inside an IDE (Sublime). This matches what I want (the function name itself), but it also matches the first parentheses. Therefore the match is like follows:
this._myFunction('content');
Notice the opening paran.
Here is my expression:
(?:[^\._])?([\w-]+)(?:[\(]){1}
How can I exclude the opening paran from getting matched?
.
As a bonus question: How can I successfully not match the string: function, because as you can expect function( matches (not fun in JS).
Thank you to anyone who can assist.
You can use (?=pattern):
A zero-width positive look-ahead assertion. For example,
"/\w+(?=\t)/" matches a word followed by a tab, without
including the tab in $&.
So where you match your open paren wrap it in (?=) instead of (?:)
Unfortunately, you cannot really use regex to parse any context-free grammar, but hopefully this can do better. It uses positive lookahead to not include the opening paren in the match but look for it anyways:
(?:[^\._])?([\w-]+)(?=[\(])
If your IDE's regex engine supports negative lookbehind (the subexpression is not found before the match), you can avoid matching the string 'function' or "function":
(?!<['"])(?:[^\._])?([\w-]+)(?=[\(])