For some reason I fail to write a correct regular expression to match the following strings:
a/b/c/d:d0/d1/d2/d3/:e;f'
a/b/c/d:d0/:e;f'
a/b/c/d:d0/:e'
a/b/c/d:d0'
In each string the c and e should be extracted. As you see, e is optional and the last string doesn't contain it. In that case the regular expression should still match and return the c.
This is the expression that I came up with, but it does not support an optional e:
a\/b\/(?<the_c>\w*)\/.*?\/:(?<the_e>\w*)
I thought to make the last part optional, but then it just doesn't find the e at all:
a\/b\/(?<the_c>\w*)\/.*?(?:\/:(?<the_e>\w*))?
^^^ ^^
Here is a link to test out this example: https://regex101.com/r/C2Jkhq/1
What's wrong with my regex here?
You can use
a\/b\/(?<the_c>\w*)\/(?:.*\/:(?<the_e>\w*))?
Details:
a\/b\/ - a/b/ string
(?<the_c>\w*) - zero or more word chars captured into "the_c" group
\/ - a / char
(?:.*\/:(?<the_e>\w*))? - an optional sequence (that is tried at least once) matching:
.* - any zero or more chars other than line break chars as many as possible
\/: - /: string
(?<the_e>\w*) - zero or more word chars captured into "the_e" group .
See this regex demo.
Related
I have followings String:
test_abc123_firstrow
test_abc1564_secondrow
test_abc123_abc234_thirdrow
test_abc1663_fourthrow
test_abc193_abc123_fifthrow
I want to get the abc + following number of each row.
But just the first one if it has more than one.
My current pattern looks like this: ([aA][bB][cC]\w\d+[a-z]*)
But this doesn't involve the first one only.
If somebody could help how I can implement that, that would be great.
You can use
^.*?([aA][bB][cC]\d+[a-z]*)
Note the removed \w, it matches letters, digits and underscores, so it looks redundant in your pattern.
The ^.*? added at the start matches the
^ - start of string
.*? - any zero or more chars other than line break chars as few as possible
([aA][bB][cC]\d+[a-z]*) - Capturing group 1: a or A, b or B, c or C, then one or more digits and then zero or more lowercase ASCII letters.
Use the following regex:
^.*?([aA][bB][cC]\d+)
Use ^ to begin at the start of the input
.*? matches zero or more characters (except line breaks) as few times as possible (lazy approach)
The rest is then captured in the capturing group as expected.
Demo
I am extracting a piece of string from a string (link):
https://arteptweb-vh.akamaihd.net/i/am/ptweb/100000/100000/100095-000-A_0_VO-STE%5BANG%5D_AMM-PTWEB_XQ.1V7rLEYkPH.smil/master.m3u8
The desired output should be 100000/100000/100095-000-A_
I am using the Regex ^.*?(/[i,na,fm,d]([,/]?)(/am/ptweb/|.+=.+,))([^_]*).*?$ in Golang flavor and I can get only the group 4 with the folowing output 100000/100000/100095-000-A
However I want the underscore after A.
Bit stuck on this, any help on this is appreciated.
You can use
(/(i|na|fm|d)(/am/ptweb/|.+=.+,))([^_]*_?)
See the regex demo.
Details:
(/(i|na|fm|d)(/am/ptweb/|.+=.+,)) - Group 1:
/ - a / char
(i|na|fm|d) - Group 2: i, na, fm or d
(/am/ptweb/|.+=.+,) - Group 3: /amp/ptweb/ or one or more chars as many as possible (other than line break chars), =, one or more chars as many as possible (other than line break chars) and a , char
([^_]*_?) - Group 4: zero or more chars other than _ and then an optional _.
You can match the underscore after the A like:
^.*?(/(?:[id]|na|fm)([,/]?)(/am/ptweb/|.+=.+,))([^_]*_).*$
See a regex demo
A few notes about the pattern that you tried:
This notation is a character class [i,na,fm,d] which should be a grouping (?:[id]|na|fm)
In this group ([,/]?) you optionally capture either , or / so in theory it could match a string that has /i//am/ptweb/
The last part .*?$ does not have to be non greedy as it is the last part of the pattern
This part [^_]* can also match spaces and newlines
I am trying to extract stock symbols from a body of text. These matches usually come in the following forms:
(<symbol>) => (VOO)
(<market>:<symbol>) => (NASDAQ:C)
In the sample cases shown above, I'd like to match VOO and C, skipping everything else. This regex gets me halfway there:
(?<=\()(.*?)(?=\))
With this, I match what's included within the parentheses, but the logic that ignores "noise" like NASDAQ: eludes me. I'd love to learn how to conditionally specify this pattern/logic.
Any ideas? Thanks!
You can use
[A-Z]+(?=\))
See the regex demo.
Details:
[A-Z]+ - one or more uppercase ASCII letters
(?=\)) - a positive lookahead that matches a location that is immediately followed with a ) char.
Alternatively, you can use the following to capture the values into Group 1:
\((?:[^():]*:)?([A-Z]+)\)
See this regex demo. Details:
\( - a ( char
(?:[^():]*:)? - an optional sequence of any zero or more chars other than (, ) and : and then a : char
([A-Z]+) - Group 1: one or more uppercase ASCII letters
\) - a ) char.
I'm trying to find a regex to check for the validity of options that are supplied with a command.
Say that -a, -b and -c are valid options. They may be combined, for example as -ac or -abc. Order doesn't matter, so -ba is also valid.
I thought this regex would do the trick:
^-[abc]{1,3}$
But it has a downside. This regex also accepts duplicates, i.e. -abb.
How do I modify this regex to disallow duplicates?
You may use this regex with a capture group and a negative lookahead:
^-((?!.*\1)[abc]){1,3}$
RegEx Demo
RegEx Details:
^: Start
-: Match a -
(: Start capture group #1
(?!.*\1): Negative lookahead to make sure we don't have repeat of what we have in capture group #1 anywhere in the input
[abc]: Match a or b or c
){1,3}: End capture group #1. Repeat this group 1 to 3 times
$: End
You could list all the alternatives, but if it is a long character class, you can check that on the right side there is no char that is already captured using a capture group and a backreference.
^-(?![abc]*?([abc])[abc]*?\1)[abc]{1,3}$
^ Start of string
- Match a hyphen
(?! Negative lookahead, assert that at the right is not
[abc]*([abc])[abc]*\1 Match optional chars a, b or c and then capture 1 char. Then check that the captured char does not occur at the right side
) Close lookahead
[abc]{1,3} Match 1-3 times a b or c
$ End of string
Regex demo
Or a short version using only non whitespace chars, as the character class can only match 3 chars.
^-(?!\S*(\S)\S*\1)[abc]{1,3}$
Regex demo
I have this regular expression:
/^www\.example\.(com|co(\.(in|uk))?|net|us|me)\/?(.*)?[^\/]$/g
It matches:
www.example.com/example1/something
But doesn't match
www.example.com/example1/something/
But the problem is that, it matches: I do not want it to match:
www.example.com/example1/something/otherstuff
I just want it to stop when a slash is enountered after "something". If there is no slash after "something", it should continue matching any character, except line breaks.
I am a new learner for regex. So, I get confused easily with those characters
You may use this regex:
^www\.example\.(?:com|co(?:\.(?:in|uk))?|net|us|me)(?:\/[^\/]+){2}$
RegEx Demo
This will match following URL:
www.example.co.uk/example1/something
You can use
^www\.example\.(?:com|co(?:\.(?:in|uk))?|net|us|me)\/([^\/]+)\/([^\/]+)$
See the regex demo
The (.*)? part in your pattern matches any zero or more chars, so it won't stop even after encountering two slashes. The \/([^\/]+)\/([^\/]+) part in the new pattern will match two parts after slash, and capture each part into a separate group (in case you need to access those values).
Details:
^ - start of string
www\.example\. - www.example. string
(?:com|co(?:\.(?:in|uk))?|net|us|me) - com, co.in, co.uk, co, net, us, me strings
\/ - a / char
([^\/]+) - Group 1: one or more chars other than /
\/ - a / char
([^\/]+) - Group 2: one or more chars other than /
$ - end of string.