Balanced-parentheses problem: match function parameters [duplicate] - regex

This question already has an answer here:
Unexpected behavior around recursive regex
(1 answer)
Closed 5 days ago.
I need a PCRE regex to match a particular balanced-parentheses situation, and while I've looked at a bunch of discussions of the problem (e.g. here), I can't get anything to work. I'm trying to use regex101.com to debug.
I'm trying to match a string like foo inside MyFunction(foo), where foo can be anything, including other function calls that can themselves contain parentheses. That is, I want to be able to match
MyFunction(23)
MyFunction('Sekret')
MyFunction( 23, 'Sekret', Date() )
MyFunction( Encrypt('Sekret'), Date() )
But MyFunction can be part of a larger string which itself can have parentheses, e.g.
logger.info("MyFunction( Encrypt('Sekret'), Date() )")
in which case I want only what is passed to MyFunction to be matched, not what is passed to logger.info.
The purpose of this is to replace whatever is passed to MyFunction with something else, e.g. "XXXXXX".
Using a regex from the linked question (e.g. \((?:[^)(]+|(?R))*+\)) will match the outermost set of parens, but if I try to preface it with MyFunction—i.e. if I try the regex MyFunction\((?:[^)(]+|(?R))*+\)—it fails.

regex: MyFunction(\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\))
demo
updated regex , supports PCRE.
MyFunction\((?:[^()]*(?:\([^()]*\))*[^()]*)*\)
demo2
updated again regex, supports nested function
MyFunction\(((?:[^()]++|\((?1)\))*)\)
demo3

Would you please try:
(?<=MyFunction\()([^()]+|\((?1)*\))+(?=\))
Demo
(?<=MyFunction\() is a positive lookbehind assertion to match the string
MyFunction( without including the match in the capture group.
([^()]+|\((?1)*\)) is the alternation of [^()]+ or \((?1)*\). The former
matches a sequence of characters other than ( and ). The latter recursively
matches a parenthesized string, where (?1) is the partial recursion of the capture
group 1.
The final (?=\)) is a positive lookahead assertion to match the right paren )
to make a pair with MyFunction(.

Related

enclosing regular expression in parentheses in Notepad++ [duplicate]

This question already has an answer here:
Notepad++: add parentheses to timestamps
(1 answer)
Closed 1 year ago.
so i have a big list of items in excel. i copied them to Notepad++ because it has regex built in.
it could be AuAC21-XTS02L or BgUX20-C02S etc. basically i want to replace thses two with Au(AC21-XTS02)L and Bg(UX20-C02)S.
with the regular expression \D\D\d\d-(\D){1,3}\d\d i can perfectly find the part of the text that i want to enclose with parentheses but now i dont know how.
i tried using (\D\D\d\d-(\D){1,3}\d\d) as replacement but then i just receive something like Au(DDdd-D{1,3}dd)L.
any help would be appreciated.
You can store the whole matched string in a group and then replace that with ($1). Note that depending on your Notepad++ version you may need to use \ instead of $ to refer to a matching group (i.e. the replacement string would be (\1))
Take a look at this Regex101 snippet: https://regex101.com/r/0e1Wcc/1
It will convert a sample input like,
it could be AuAC21-XTS02L or BgUX20-C02S etc.
it could be AuAC21-XTS02L or BgUX20-C02S etc.
it could be AuAC21-XTS02L or BgUX20-C02S etc.
into
it could be Au(AC21-XTS02)L or Bg(UX20-C02)S etc.
it could be Au(AC21-XTS02)L or Bg(UX20-C02)S etc.
it could be Au(AC21-XTS02)L or Bg(UX20-C02)S etc.
You can take the full match $0 for pattern \D\D\d\d-\D{1,3}\d\d without a capture group because that is not needed, and use it in the replacement between parenthesis \($0\)
The output will be
Au(AC21-XTS02)L or Bg(UX20-C02)S
Note that \D matches any character except a digit, so it could also match a space or a newline.
Looking at the example strings, at bit more precise match (using the same replacement \($0\) could be:
[A-Z][a-z]\K[A-Z]{2}\d\d-[A-Z0-9]{1,3}\d\d(?=[A-Z])
Regex demo

Regex question- string must appear in a specific way or not at all

i need help with a REGEX expression (for analytics).
not sure how to handle the requirements.
Here's an example of a URL:
/a.html?ref=aa&project=11&utm=bb
This URL would have &project=XX in the middle but it is possible that &project won't be there at all..
Requirements:
I want the regex to be positive only for specific project=XX (for example only when XX equals 11 or 12 or 13) but negative for all other values (project=22).
The parameter before it (?ref in the example below) is mandatory
Any parameter afterwards (&utm) is optional
For example:
fine: /a.html?ref=aa&project=11&utm=bb
fine: /a.html?ref=aa&utm=bb
not fine: /a.html?ref=aa&project=22&utm=bb
How do I approach this?
I tried this it kinda works (but only without additional utm params):
\/a.html\?ref\=aa(\&project\=(11|12|13))?$
I tried this, but it doesn't work when using the utm parameter:
\/a.html\?ref\=aa(\&project\=(11|12|13))?(\&utm\=.*)?$
Thanks
Itay
You don't say what platform you're using, but you'll need to escape your forward slashes and question marks if you want them to match literal characters on most platforms:
\/a.html\?ref=aa(&project=(11|12|13))?(&utm=.*)?$
You might also want to minimize your capture in the utm block in case other things come after it that you don't want:
\/a.html\?ref=aa(&project=(11|12|13))?(&utm=.*?)?$
You could use character class [123] to match either 1,2 or 3 with a single optional group, and note to escape the dot to match it literally.
\/a\.html\?ref=aa(&project=1[123])?&utm=.*$
The pattern matches:
\/a\.html match /a.html
\?ref=aa Match ?ref=aa
( Capture group
&project=1[123] Match &project=1 and then either 1,2 or 3
)? Close the non capture group to make it optional
&utm=.*$ Match &utm= followed by the rest of the line
Regex demo

Regex to find after particular word inside a string

I am using regex to find few keywords after colon(:) and the best I have reached so far is:
sample test case
test {
test1 {
sadffd(test: "aff", aaa: "aa1") {}
}
}
Now I have to find a keyword inside () brackets and its working for 'aaa' but when I add test it fails, it matches entire words in string.
my regex so far
\btest(.*\w") (failed case) expected "aff" returned "aff", aaa: "aa1"
\baaa(.*\w") (pass case) returned "aa1"
please let me know if more information is needed
You may try
:\s*"(.*?)"
And the data you need is in the first capturing group.
Explanation
:\s*"(.*?)"
: colon
\s* followed by optionally any number of spaces
" followed by quote
( ) capturing group, containing...
.*? any number of character, matching as few as possible
" followed by quote
Demo:
https://regex101.com/r/WnvzdG/1
Update:
If you want to match ONLY after specific keywords, followed by colon, you can do something like:
(KEYWORD1|KEYWORD2|KEYWORD3)\s*:\s*"(.*?)"
First capture group will be the keyword matched, second capture group will be the value.
One more approach (executed in Python)
items = ['test{test1 {sadffd(test: "aff", aaa: "aa1") {}}}']
for item in items:
print(re.findall(r'"(\w+)"',item))
print(re.findall(r'(?<=: )"(\w+)"',item))
Output
['aff', 'aa1']
['aff', 'aa1']
I believe a simple regex would work to get everything inside the double quotes in your case:
("\w+")
Note that your question above says you want to capture "aff" and not just aff so I've included the surrounding quotes within the capturing group.
Example from regex101:
It's pretty crude but this should be OK for the input you've presented. (It wouldn't handle things like an escaped double quote in the string, for example).

Remove string between 2 pattern using gawk regex only

Input:
secNm:ATA,_class:com.dddao.domaffin.summaggrfy.GddenericMohsg},{ttlRec:0,ttlVal:{:0}secNm:B2B,_class:com.xyz.dakjdain.sfffummary.GenericMo73hs}extra
secNm:ATA,_class:com.dddao.domaffin.summaggrfy.GddenericMohsg},{ttlRec:0,ttlVal:{:0}secNm:B2B,_class:com.xyz.dakjdain.sfffummary.GenericMo73hs
In above both the string I want to remove,
For String 1: parts which stars from ",_class" and ends at 1st occurrence "}"
For String 2: parts which stars from ",_class" till the end if if 1st condition fails.
Output:
secNm:ATA,{ttlRec:0,ttlVal:{:0}secNm:B2Bextra
secNm:ATA,{ttlRec:0,ttlVal:{:0}secNm:B2B
This type of pattern is present undefinable times in this above string.
I want simple want to remove those part.
I have written regex function gsub(/,_class(.*?)\}/,"",$0)
I want answer only using gawk regex function only no other method.
My above give function is having some issue and removing big part of the string.
Help me to correct my regex formula please.
Thanks in advance.
You may use a [^}] negated bracket expression to match any char but } since lazy quantifiers are not supported.
Besides, you do not even need a grouping construct here as you are not referring to the captured value here. You may remove ( and ) safely.
Use
/,_class[^}]*}/
Basically, this should be understood as:
,_class - match ,_class substring
[^}]* - 0 or more chars other than }
} - up to and including }.

Regex (ICU) for matching between parentheses

Looking for some regex which will create a capture group for words occurring within parentheses, ignoring the parentheses themselves. The regex must be either PCRE or ICU.
Input: ( lakshd asd___ asa1123 Name : _____)
Desired Output: Name
What I've tried:
\\((Name|name|NAME)\\)
(?<=\\()name|Name|NAME(?=\\))
\\(name|Name|NAME\\)
What I've tried:
\\((Name|name|NAME)\\)
(?<=\\()name|Name|NAME(?=\\))
\\(name|Name|NAME\\)
All these patterns look for name or Name or NAME that has a ( immediately before and ) right after, with difference being what is captured or returned as a match. To match some word inside parentheses, you need to use \([^()]* before the value you need to get, and [^()]*\) after it.
Also, there is no point in extracting something you already know.
So, if you plan to extract the last word from the parentheses, you may use
> library(stringr)
> s = "( lakshd asd___ asa1123 Name : _____)"
> res <- str_match(s, "(?i)\\([^()]*\\b([a-z]\\w*)\\b[^()]*\\)")
> res[,2]
[1] "Name"
Note that str_match allows accessing captured values.
The (?i)\\([^()]*\\b([a-z]\\w*)\\b[^()]*\\) pattern matches parentheses and the last whole word from it.
If nested levels of parentheses are not likely to happen then looking if current position is going to be followed by a closing parenthesis at the end while an opening parenthesis is supposed to be opened already will do the trick (works with both ICU and PCRE):
(Name|name|NAME)(?=[^()]*\))
PCRE live demo