golang regex get the string including the search character - regex

I am extracting a piece of string from a string (link):
https://arteptweb-vh.akamaihd.net/i/am/ptweb/100000/100000/100095-000-A_0_VO-STE%5BANG%5D_AMM-PTWEB_XQ.1V7rLEYkPH.smil/master.m3u8
The desired output should be 100000/100000/100095-000-A_
I am using the Regex ^.*?(/[i,na,fm,d]([,/]?)(/am/ptweb/|.+=.+,))([^_]*).*?$ in Golang flavor and I can get only the group 4 with the folowing output 100000/100000/100095-000-A
However I want the underscore after A.
Bit stuck on this, any help on this is appreciated.

You can use
(/(i|na|fm|d)(/am/ptweb/|.+=.+,))([^_]*_?)
See the regex demo.
Details:
(/(i|na|fm|d)(/am/ptweb/|.+=.+,)) - Group 1:
/ - a / char
(i|na|fm|d) - Group 2: i, na, fm or d
(/am/ptweb/|.+=.+,) - Group 3: /amp/ptweb/ or one or more chars as many as possible (other than line break chars), =, one or more chars as many as possible (other than line break chars) and a , char
([^_]*_?) - Group 4: zero or more chars other than _ and then an optional _.

You can match the underscore after the A like:
^.*?(/(?:[id]|na|fm)([,/]?)(/am/ptweb/|.+=.+,))([^_]*_).*$
See a regex demo
A few notes about the pattern that you tried:
This notation is a character class [i,na,fm,d] which should be a grouping (?:[id]|na|fm)
In this group ([,/]?) you optionally capture either , or / so in theory it could match a string that has /i//am/ptweb/
The last part .*?$ does not have to be non greedy as it is the last part of the pattern
This part [^_]* can also match spaces and newlines

Related

RegEx: how to don't match a repetition

I have followings String:
test_abc123_firstrow
test_abc1564_secondrow
test_abc123_abc234_thirdrow
test_abc1663_fourthrow
test_abc193_abc123_fifthrow
I want to get the abc + following number of each row.
But just the first one if it has more than one.
My current pattern looks like this: ([aA][bB][cC]\w\d+[a-z]*)
But this doesn't involve the first one only.
If somebody could help how I can implement that, that would be great.
You can use
^.*?([aA][bB][cC]\d+[a-z]*)
Note the removed \w, it matches letters, digits and underscores, so it looks redundant in your pattern.
The ^.*? added at the start matches the
^ - start of string
.*? - any zero or more chars other than line break chars as few as possible
([aA][bB][cC]\d+[a-z]*) - Capturing group 1: a or A, b or B, c or C, then one or more digits and then zero or more lowercase ASCII letters.
Use the following regex:
^.*?([aA][bB][cC]\d+)
Use ^ to begin at the start of the input
.*? matches zero or more characters (except line breaks) as few times as possible (lazy approach)
The rest is then captured in the capturing group as expected.
Demo

Ignore Until "Spacebar+I or V or X" - Regex Expression

So... I had a regex which worked just fine (wasn't pretty but worked), until the Roman Numerals reached more than X.
Currently my Regex looks like this:
(.*?)(^(X{1,3})(I[XV]|V?I{0,3})$|^(I[XV]|V?I{1,3})$|^V$)*(.)( EP\. )(\d*)(.*)
The problem I have right now is that if roman numeral has value 10 or more it's is in 1st group which drives me nuts.
I need it to work in a way that all before roman numerals is ignored.
Test Text:
PEPA THE PIG XVI EP. 169 - BAD ENDING
Could you please help me fix the regex so it would actually do what it suppose to do?
You should re-consider using anchors in the middle of a regex: ^ requires start of string and $ requires the end of string.
Besides, (.) before ( Ep\. ) consume the space, and the Ep pattern cannot match it.
Consider using
^(.*?)\b(X{1,3}(?:I[XV]|V?I{0,3})|I[XV]|V?I{1,3}|V)\b(.)\b(EP\.)\s*(\d+)(.*)
See the regex demo. You might still need to check what exactly you want to match with (.).
Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars, as few as possible
\b - a word boundary
(X{1,3}(?:I[XV]|V?I{0,3})|I[XV]|V?I{1,3}|V) - Group 2: one to three Xs followed with IX or IV, or with an optional V and then zero to three Is, or IX, IV, or an optional V followed with one to three Is or V
\b - a word boundary
(.) - Group 3: any one char (other than a newline)
\b - a word boundary
(EP\.) - Group 4: EP.
\s* - zero or more whitespaces
(\d+) - Group 5: one or more digits
(.*) - Group 6: any zero or more chars other than line break chars, as many as possible

How do I make this regular expression not match anything after forward slash /

I have this regular expression:
/^www\.example\.(com|co(\.(in|uk))?|net|us|me)\/?(.*)?[^\/]$/g
It matches:
www.example.com/example1/something
But doesn't match
www.example.com/example1/something/
But the problem is that, it matches: I do not want it to match:
www.example.com/example1/something/otherstuff
I just want it to stop when a slash is enountered after "something". If there is no slash after "something", it should continue matching any character, except line breaks.
I am a new learner for regex. So, I get confused easily with those characters
You may use this regex:
^www\.example\.(?:com|co(?:\.(?:in|uk))?|net|us|me)(?:\/[^\/]+){2}$
RegEx Demo
This will match following URL:
www.example.co.uk/example1/something
You can use
^www\.example\.(?:com|co(?:\.(?:in|uk))?|net|us|me)\/([^\/]+)\/([^\/]+)$
See the regex demo
The (.*)? part in your pattern matches any zero or more chars, so it won't stop even after encountering two slashes. The \/([^\/]+)\/([^\/]+) part in the new pattern will match two parts after slash, and capture each part into a separate group (in case you need to access those values).
Details:
^ - start of string
www\.example\. - www.example. string
(?:com|co(?:\.(?:in|uk))?|net|us|me) - com, co.in, co.uk, co, net, us, me strings
\/ - a / char
([^\/]+) - Group 1: one or more chars other than /
\/ - a / char
([^\/]+) - Group 2: one or more chars other than /
$ - end of string.

Using regex replacement in Sublime 3

I am trying to use replace in Sublime using regular expressions but I'm stuck. I tried various combinations but don't seem to be getting there.
This is the input and my desired output:
Input: N_BBP_c_46137_n
Output : BBP
I tried combinations of:
[^BBP]+\b
\*BBP*+\g
But none of the above (and many others) don't seem to work.
To turn N_BBP_c_46137_n into BBP and according to the comment just want that entire long name such as N_BBP_ to be replaced by only BBP* you might also use a capture group to keep BBP.
\bN_(BBP)_\S*
\bN_ Match N preceded by a word boundary
(BBP) Capture group 1, match BBP (or use [A-Z]+ to match 1+ uppercase chars)
_\S* Match _ followed by 0+ times a non whitespace char
In the replacement use the first capturing group $1
Regex demo
You may use
(N_)[^_]*(_c_\d+_n)
Replace with ${1}some new value$2.
Details
(N_) - Group 1 ($1 or ${1} if the next char is a digit): N_
[^_]* - any 0 or more chars other than _
-(_c_\d+_n) - Group 2 ($2): _c_, 1 or more digits and then _n.
See the regex demo.

Regexp Substring From URL

I need to retrieve some word from url :
WebViewActivity - https://google.com/search/?term=iphone_5s&utm_source=google&utm_campaign=search_bar&utm_content=search_submit
return I want :
search/iphone_5s
but I'm stuck and not really understand how to use regexp_substr to get that data.
I'm trying to use this query
regexp_substr(web_url, '\google.com/([^}]+)\/', 1,1,null,1)
which only return the 'search' word, and when I try
regexp_substr(web_url, '\google.com/([^}]+)\&', 1,1,null,1)
it turns out I get all the word until the last '&'
You may use a REGEXP_REPLACE to match the whole string but capture two substrings and replace with two backreferences to the capture group values:
REGEXP_REPLACE(
'WebViewActivity - https://google.com/search/?term=iphone_5s&utm_source=google&utm_campaign=search_bar&utm_content=search_submit',
'.*//google\.com/([^/]+/).*[?&]term=([^&]+).*',
'\1\2')
See the regex demo and the online Oracle demo.
Pattern details
.* - any zero or more chars other than line break chars as many as possible
//google\.com/ - a //google.com/ substring
([^/]+/) - Capturing group 1: one or more chars other than / and then a /
.* - any zero or more chars other than line break chars as many as possible
[?&]term= - ? or & and a term= substring
([^&]+) - Capturing group 2: one or more chars other than &
.* - any zero or more chars other than line break chars as many as possible
NOTE: To use this approach and get an empty result if the match is not found, append |.+ at the end of the regex pattern.