Regex, continue matching after lookaround - regex

I'm having trouble with lookaround in regex.
Here the problem : I have a big file I want to edit, I want to change a function by another keeping the first parameter but removing the second one.
Let say we have :
func1(paramIWantToKeep, paramIDontWant)
or
func1(func3(paramIWantToKeep), paramIDontWant)
I want to change with :
func2(paramIWantToKeep) in both case.
so I try using positive lookahead
func1\((?=.+), paramIDontWant\)
Now, I just try not to select the first parameter (then I'll manage to do the same with the parenthesis).
But it doesn't work, it appears that my regex, after ignoring the positive look ahead (.+) look for (, paramIDontWant\)) at the same position it was before the look ahead (so the opening parenthesis)
So my question is, how to continue a regex after a matching group, here after (.+).
Thanks.
PS: Sorry for the english and/or the bad construction of my question.
Edit : I use Sublime Text

The first thing you need to understand is that a regex will always match a consecutive string. There will never be gaps.
Therefore, if you want to replace 123abc456 with abc, you can't simply match 123456 and remove it.
Instead, you can use a capturing group. This will allow you to remember a section of the regex for later.
For example, to replace 123abc456 with abc, you could replace this regex:
\d+([a-z]+)\d+
with this string:
$1
What that does is actually replaces the match with the contents of the first capturing group. In this case, the capturing group was ([a-z]+), which matches abc. Thus, the entire match is replaced with just abc.
An example you may find more useful:
Given:
func1(foo, bar)
replacing this regex:
\w+\((\w+),\s*\w+\)
with this string:
func2($1)
results in:
func2(foo)

import re
t = "func1(paramKeep,paramLose)"
t1 = "func1(paramKeep,((paramLose(dog,cat))))"
t2 = "func1(func3(paramKeep),paramDont)"
t3 = "func1(func3(paramKeep),paramDont,((i)),don't,want,these)"
reg = r'(\w+\(.*?(?=,))(,.*)(\))'
keep,lose,end = re.match(reg,t).groups()
print(keep+end)
keep,lose,end = re.match(reg,t1).groups()
print(keep+end)
keep,lose,end = re.match(reg,t2).groups()
print(keep+end)
keep,lose,end = re.match(reg,t3).groups()
print(keep+end)
Produces
>>>
func1(paramKeep)
func1(paramKeep)
func1(func3(paramKeep))
func1(func3(paramKeep))

Apply these two regexp in this order
s/(func1)([^,]*)(, )?(paramIDontWant)(.)/func2$2$5/;
s/(func2\()(func3\()(paramIWantToKeep).*/$1$3)/;
These cope with the two examples you gave. I guess that the real world code you are editing is slightly more complicated but the general idea of applying a series of regexps might be helpful

Related

Case analysis with REGEX

I have some data like
small_animal/Mouse
BigAnimal:Elephant
Not an animal.
What I want to get is:
Mouse
Elephant
Not an animal.
Thus, I need a regular expression that searches for / or : as follows: If one of these is found, take the text behind that character. If neither / nor : exists, take the whole string.
I tried a lot. For example this will work for mouse and elephant, but not for the third line:
(?<=:)[^:]*|(?<=/)[^/]*
And this will always give the full string ...
(?<=:)[^:]*|(?<=/)[^/]*|^.*$
My head is burning^^ Maybe, somebody can help? :) Thanks a lot!
EDIT:
#The fourth bird offered a nice solution for single characters. But what if I want to search for strings like
animal::Dog
Another123Cat
Not an animal.
How can I split on :: or 123?
You might use
^(?:[^:/]*[:/])?\K.+
^ Start of string
(?:[^:/]*[:/])? Optionally match any char except : or / till matching either : or /
\K Forget what is matched so far
.+ Match 1+ times any char
regex demo
If you don't want to cross a newline, you can extend the character class with [^:/\r\n]*
Another option could be using an alternation
^[^:/]*[:/]\K.+|.+
Regex demo
Or perhaps making use of a SKIP FAIL approach by matching what you want to omit
^[^:/]*[:/](*SKIP)(*F)|.+
Regex demo
If you want to use multiple characters, you might also use
^(?:(?:(?!123|::|[:/]).)*+(?:123|::|[:/]))?\K.+
Regex demo

Regex match last substring among same substrings in the string

For example we have a string:
asd/asd/asd/asd/1#s_
I need to match this part: /asd/1#s_ or asd/1#s_
How is it possible to do with plain regex?
I've tried negative lookahead like this
But it didn't work
\/(?:.(?!\/))?(asd)(\/(([\W\d\w]){1,})|)$
it matches this '/asd/asd/asd/asd/asd/asd/1#s_'
from this 'prefix/asd/asd/asd/asd/asd/asd/1#s_'
and I need to match '/asd/1#s_' without all preceding /asd/'s
Match should work with plain regex
Without any helper functions of any programming language
https://regexr.com/
I use this site to check if regex matches or not
here's the possible strings:
prefix/asd/asd/asd/1#s
prefix/asd/asd/asd/1s#
prefix/asd/asd/asd/s1#
prefix/asd/asd/asd/s#1
prefix/asd/asd/asd/#1s
prefix/asd/asd/asd/#s1
and asd part could be replaced with any word like
prefix/a1sd/a1sd/a1sd/1#s
prefix/a1sd/a1sd/a1sd/1s#
...
So I need to match last repeating part with everything to the right
And everything to the right could be character, not character, digit, in any order
A more complicated string example:
prefix/a1sd/a1sd/a1sd/1s#/ds/dsse/a1sd/22$$#!/123/321/asd
this should match that part:
/a1sd/22$$#!/123/321/asd
Try this one. This works in python.
import re
reg = re.compile(r"\/[a-z]{1,}\/\d+[#a-z_]{1,}")
s = "asd/asd/asd/asd/1#s_"
print(reg.findall(s))
# ['/asd/1#s_']
Update:
Since the question lacks clarity, this only works with the given order and hence, I suppose any other combination simply fails.
Edits:
New Regex
reg = r"\/\w+(\/\w*\d+\W*)*(\/\d+\w*\W*)*(\/\d+\W*\w*)*(\/\w*\W*\d+)*(\/\W*\d+\w*)*(\/\W*\w*\d+)*$"

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

Get all matches for a certain pattern using RegEx

I am not really a RegEx expert and hence asking a simple question.
I have a few parameters that I need to use which are in a particular pattern
For example
$$DATA_START_TIME
$$DATA_END_TIME
$$MIN_POID_ID_DLAY
$$MAX_POID_ID_DLAY
$$MIN_POID_ID_RELTM
$$MAX_POID_ID_RELTM
And these will be replaced at runtime in a string with their values (a SQL statement).
For example I have a simple query
select * from asdf where asdf.starttime = $$DATA_START_TIME and asdf.endtime = $$DATA_END_TIME
Now when I try to use the RegEx pattern
\$\$[^\W+]\w+$
I do not get all the matches(I get only a the last match).
I am trying to test my usage here https://regex101.com/r/xR9dG0/2
If someone could correct my mistake, I would really appreciate it.
Thanks!
This will do the job:
\$\$\w+/g
See Demo
Just Some clarifications why your regex is doing what is doing:
\$\$[^\W+]\w+$
Unescaped $ char means end of string, so, your pattern is matching something that must be on the end of the string, that's why its getting only the last match.
This group [^\W+] doesn't really makes sense, groups starting with [^..] means negate the chars inside here, and \W is the negation of words, and + inside the group means literally the char +, so you are saying match everything that is Not a Not word and that is not a + sign, i guess that was not what you wanted.
To match the next word just \w+ will do it. And the global modifier /g ensures that you will not stop on the first match.
This should work - Based on what you said you wanted to match this should work . Also it won't match $$lower_case_strings if that's what you wanted. If not, add the "i" flag also.
\${2}[A-Z_]+/g

how to group in regex matching correctly?

consider following scenario
input string = "WIPR.NS"
i have to replace this with "WIPR2.NS"
i am using following logic.
match pattern = "(.*)\.NS$" \\ any string that ends with .NS
replace pattern = "$12.NS"
In above case, since there is no group with index 12, i get result $12.NS
But what i want is "WIPR2.NS".
If i don't have digit 2 to replace, it works in all other cases but not working for 2.
How to resolve this case?
Thanks in advance,
Alok
Usually depends entirely on your regex engine (I'm not familiar with those that use $1 to represent a capture group, I'm more used to \1 but you'd have the same problem with that).
Some will provide a delimiter that you can use, like:
replace pattern = "${1}2.NS"
which clearly indicates that you want capture group 1 followed by the literal 2.NS.
In fact, by looking at this page, it appears that's exactly the way to do it (assuming .NET):
To replace with the first backreference immediately followed by the digit 9, use ${1}9. If you type $19, and there are less than 19 backreferences, the $19 will be interpreted as literal text, and appear in the result string as such.
Also keep in mind that Jay provides an excellent answer for this specific use case that doesn't require capture groups at all (by just replacing .NS with 2.NS).
You may want to look into that as a possibility - I'll leave this answer here since:
it's the accepted answer; and
it probably better for the more complex cases, like changing X([A-Z])4([A-Z]) with X${1}5${2}, where you have variable text on either side of the bit you wish to modify.
You don't need to do anything with what precedes the .NS, since only what is being matched is subject to replacement.
match pattern = "\.NS$" (any string that ends with .NS -- don't forget to escape the .)
replace pattern = "2.NS"
You can further refine this with lookaround zero-width assertions, but that depends on your regex engine, and you have not specified the environment/programming language in which you are working.