Problem with regular expression with 2 capture group, one is optional - regex

I'm struggling to write the correct regex to match the data below. I want to capture the "Focus+Terminal" and its optional parameter "NYET". How can I re-write my incorrect regex?
user:\/\/(.*)(?:=(.*+))?
I also tried and failed:
user:\/\/(.*)=?(?:(.*+))?
Sample Data
* user://Focus+Terminal=NYET
* user://Focus+Terminal

You can use
user:\/\/(.*?)(?:=(.*))?$
See the regex demo.
Details:
user:\/\/ - a user:// string
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
(?:=(.*))? - an optional non-capturing group that matches a = and then captures into Group 2 any zero or more chars other than line break chars as many as possible
$ - end of string.

As an alternative you might use a negated character class excluding matching a newline or equals sign for the first capture group.
user:\/\/([^=\n]*)(?:=(.*))?
Explanation
user:\/\/ Match user://
([^=\n]*) Capture group 1, match optional chars other than = or a newline
(?:=(.*))? Optionally match = and capture the rest of the line in group 2
Regex demo

Related

replaceAll regex to remove last - from the output

I was able to achieve some of the output but not the right one. I am using replace all regex and below is the sample code.
final String label = "abcs-xyzed-abc-nyd-request-xyxpt--1-cnaq9";
System.out.println(label.replaceAll(
"([^-]+)-([^-]+)-(.+)-([^-]+)-([^-]+)", "$3"));
i want this output:
abc-nyd-request-xyxpt
but getting:
abc-nyd-request-xyxpt-
here is the code https://ideone.com/UKnepg
You may use this .replaceFirst solution:
String label = "abcs-xyzed-abc-nyd-request-xyxpt--1-cnaq9";
label.replaceFirst("(?:[^-]*-){2}(.+?)(?:--1)?-[^-]+$", "$1");
//=> "abc-nyd-request-xyxpt"
RegEx Demo
RegEx Details:
(?:[^-]+-){2}: Match 2 repetitions of non-hyphenated string followed by a hyphen
(.+?): Match 1+ of any characters and capture in group #1
(?:--1)?: Match optional --1
-: Match a -
[^-]+: Match a non-hyphenated string
$: End
The following works for your example case
([^-]+)-([^-]+)-(.+[^-])-+([^-]+)-([^-]+)
https://regex101.com/r/VNtryN/1
We don't want to capture any trailing - while allowing the trailing dashes to have more than a single one which makes it match the double --.
With your shown samples and attempts, please try following regex. This is going to create 1 capturing group which can be used in replacement. Do replacement like: $1in your function.
^(?:.*?-){2}([^-]*(?:-[^-]*){3})--.*
Here is the Online demo for above regex.
Explanation: Adding detailed explanation for above regex.
^(?:.*?-){2} ##Matching from starting of value in a non-capturing group where using lazy match to match very near occurrence of - and matching 2 occurrences of it.
([^-]*(?:-[^-]*){3}) ##Creating 1st and only capturing group and matching everything before - followed by - followed by everything just before - and this combination 3 times to get required output.
--.* ##Matching -- to all values till last.

golang regex get the string including the search character

I am extracting a piece of string from a string (link):
https://arteptweb-vh.akamaihd.net/i/am/ptweb/100000/100000/100095-000-A_0_VO-STE%5BANG%5D_AMM-PTWEB_XQ.1V7rLEYkPH.smil/master.m3u8
The desired output should be 100000/100000/100095-000-A_
I am using the Regex ^.*?(/[i,na,fm,d]([,/]?)(/am/ptweb/|.+=.+,))([^_]*).*?$ in Golang flavor and I can get only the group 4 with the folowing output 100000/100000/100095-000-A
However I want the underscore after A.
Bit stuck on this, any help on this is appreciated.
You can use
(/(i|na|fm|d)(/am/ptweb/|.+=.+,))([^_]*_?)
See the regex demo.
Details:
(/(i|na|fm|d)(/am/ptweb/|.+=.+,)) - Group 1:
/ - a / char
(i|na|fm|d) - Group 2: i, na, fm or d
(/am/ptweb/|.+=.+,) - Group 3: /amp/ptweb/ or one or more chars as many as possible (other than line break chars), =, one or more chars as many as possible (other than line break chars) and a , char
([^_]*_?) - Group 4: zero or more chars other than _ and then an optional _.
You can match the underscore after the A like:
^.*?(/(?:[id]|na|fm)([,/]?)(/am/ptweb/|.+=.+,))([^_]*_).*$
See a regex demo
A few notes about the pattern that you tried:
This notation is a character class [i,na,fm,d] which should be a grouping (?:[id]|na|fm)
In this group ([,/]?) you optionally capture either , or / so in theory it could match a string that has /i//am/ptweb/
The last part .*?$ does not have to be non greedy as it is the last part of the pattern
This part [^_]* can also match spaces and newlines

Comma separated prefix list with commas inside

I'm trying to match a comma separated list with prefixed values which contains also a comma.
I finally made it to match all occurrence which doesn't have a ,.
Sample String (With NL for visualization - original string doesn't have NL):
field01=Value 1,
field02=Value 2,
field03=<xml value>,
field04=127.0.0.1,
field05=User-Agent: curl/7.28.0\r\nHost: example.org\r\nAccept: */*,
field06=Location, Resource,
field07={Item 1},{Item 2}
My actual RegEx looks like this not optimized piece ....
(?'fields'(field[0-9]{2,3})=?([\s\w\d_<>.:="*?\-\/\\(){}<>'#]+))([^,](?&fields))*
Any one has a clue how to solve this?
EDIT:
The first pattern is near to my expected result.
This is a anonymized full example of the string:
asm01=Predictable Resource Location,Information Leakage,asm02=N/A,asm04=Uncategorized,asm08=2021-02-15 09:18:16,asm09=127.0.0.1,asm10=443,asm11=N/A,asm15=,asm16=DE,asm17=User-Agent: curl/7.29.0\r\nHost: dev.example.com\r\nAccept: */*\r\nX-Forwarded-For: 127.0.0.1\r\n\r\n,asm18=/Common/_www.example.com_live_v1,asm20=127.0.0.1,asm22=,asm27=HEAD,asm34=/Common/_www.example.com_live_v1,asm35=HTTPS,asm39=blocked,asm41=0,asm42=3,asm43=0,asm44=Error,asm46=200000028,200100015,asm47=Unix hidden (dot-file) access,.htaccess access,asm48={Unix/Linux Signatures},{Apache/NCSA HTTP Server Signatures},asm50=40622,asm52=200000028,asm53=Unix hidden (dot-file) access,asm54={Unix/Linux Signatures},asm55=,asm61=,asm62=,asm63=8985143867830069446,asm64=example-waf.example.com,asm65=/.htaccess,asm67=Attack signature detected,asm68=<?xml version='1.0' encoding='UTF-8'?><BAD_MSG><violation_masks><block>13020008202d8a-f803000000000000</block><alarm>417020008202f8a-f803000000000000</alarm><learn>13000008202f8a-f800000000000000</learn><staging>200000-0</staging></violation_masks><request-violations><violation><viol_index>42</viol_index><viol_name>VIOL_ATTACK_SIGNATURE</viol_name><context>request</context><sig_data><sig_id>200000028</sig_id><blocking_mask>7</blocking_mask><kw_data><buffer>Ly5odGFjY2Vzcw==</buffer><offset>0</offset><length>2</length></kw_data></sig_data><sig_data><sig_id>200000028</sig_id><blocking_mask>4</blocking_mask><kw_data><buffer>Ly5odGFjY2Vzcw==</buffer><offset>0</offset><length>3</length></kw_data></sig_data><sig_data><sig_id>200100015</sig_id><blocking_mask>7</blocking_mask><kw_data><buffer>Ly5odGFjY2Vzcw==</buffer><offset>1</offset><length>9</length></kw_data></sig_data></violation></request-violations></BAD_MSG>,asm69=5,asm71=/Common/_dev.example.com_SSL,asm75=127.0.0.1,asm100=,asm101=HEAD /.htaccess HTTP/1.1\r\nUser-Agent: curl/7.29.0\r\nHost: dev.example.com\r\nAccept: */*\r\nX-Forwarded-For: 127.0.0.1\r\n\r\n#015
The pattern does not work as the fields group matches the string field
You are trying to repeat the named group fields but the example strings do not have the string field.
Note that [^,] matches any char except a comma, you can omit the capture group inside the named group field as it already is a group and \w also matches \d
With 2 capture groups:
\b(asm[0-9]+)=(.*?)(?=,asm[0-9]+=|$)
\b A word boundary
(asm[0-9]+) Capture group 1, match asm and 1+ digits
= Match literally
(.*?) Capture group 2, match any char as least as possible
(?= Positive lookahead, assert what is at the right is
,asm[0-9]+= Match ,asm followed by 1+ digits and =
| Or
$ Assert the end of the string
) Close lookahead
Regex demo
A simple solution would be (see regexr.com/5mg1b):
/((asm\d{2,3})=(.*?))(?=,asm|$)/g
Match groupings will be:
group #1 - asm01=Predictable Resource Location,Information Leakage
group #2 - asm01
group #3 - Predictable Resource Location,Information Leakage
Conditions:
This will match everything including empty values
The key here is to make sure that each match is delimited by either a comma and your field descriptor, or an end of string. A look ahead will be handy here: (?=,asm|$).

Using regex replacement in Sublime 3

I am trying to use replace in Sublime using regular expressions but I'm stuck. I tried various combinations but don't seem to be getting there.
This is the input and my desired output:
Input: N_BBP_c_46137_n
Output : BBP
I tried combinations of:
[^BBP]+\b
\*BBP*+\g
But none of the above (and many others) don't seem to work.
To turn N_BBP_c_46137_n into BBP and according to the comment just want that entire long name such as N_BBP_ to be replaced by only BBP* you might also use a capture group to keep BBP.
\bN_(BBP)_\S*
\bN_ Match N preceded by a word boundary
(BBP) Capture group 1, match BBP (or use [A-Z]+ to match 1+ uppercase chars)
_\S* Match _ followed by 0+ times a non whitespace char
In the replacement use the first capturing group $1
Regex demo
You may use
(N_)[^_]*(_c_\d+_n)
Replace with ${1}some new value$2.
Details
(N_) - Group 1 ($1 or ${1} if the next char is a digit): N_
[^_]* - any 0 or more chars other than _
-(_c_\d+_n) - Group 2 ($2): _c_, 1 or more digits and then _n.
See the regex demo.

Regexp Substring From URL

I need to retrieve some word from url :
WebViewActivity - https://google.com/search/?term=iphone_5s&utm_source=google&utm_campaign=search_bar&utm_content=search_submit
return I want :
search/iphone_5s
but I'm stuck and not really understand how to use regexp_substr to get that data.
I'm trying to use this query
regexp_substr(web_url, '\google.com/([^}]+)\/', 1,1,null,1)
which only return the 'search' word, and when I try
regexp_substr(web_url, '\google.com/([^}]+)\&', 1,1,null,1)
it turns out I get all the word until the last '&'
You may use a REGEXP_REPLACE to match the whole string but capture two substrings and replace with two backreferences to the capture group values:
REGEXP_REPLACE(
'WebViewActivity - https://google.com/search/?term=iphone_5s&utm_source=google&utm_campaign=search_bar&utm_content=search_submit',
'.*//google\.com/([^/]+/).*[?&]term=([^&]+).*',
'\1\2')
See the regex demo and the online Oracle demo.
Pattern details
.* - any zero or more chars other than line break chars as many as possible
//google\.com/ - a //google.com/ substring
([^/]+/) - Capturing group 1: one or more chars other than / and then a /
.* - any zero or more chars other than line break chars as many as possible
[?&]term= - ? or & and a term= substring
([^&]+) - Capturing group 2: one or more chars other than &
.* - any zero or more chars other than line break chars as many as possible
NOTE: To use this approach and get an empty result if the match is not found, append |.+ at the end of the regex pattern.