How to built a regexp to match optional patterns

How to built a regexp to match optional patterns - regex

I have the following strings sample:
MAREMMA TOSCANA BIANCO DOC 2020 CALASOLE MONTEMASSI0,750
CHIANTI CLASSICO DOCG 2012 RISERVA ALBOLA LT.0,750
I need to separate in 5 parts (where I put the | in the following samples:
MAREMMA TOSCANA BIANCO DOC |2020| CALASOLE MONTEMASSI|0,750
CHIANTI CLASSICO DOCG |2012| RISERVA ALBOLA |LT.|0,750
AS you can see, the fourth part is optional.
I tried some variation of this regexp on https://regex101.com/r/NX3DE3/1, but the LT. part is incorporated in the precedent one:
([A-Za-z ]+)((20\d\d)|(19\d\d))([A-Za-z ]*)((LT))\.?[0-9,]*
the ((LT)) group is optional, but if I add a ? it run in the first example, but is not in the second and viceversa.
I would also like to trim the different parts, but really don't know how!

You can use
^(.*?)\s*((?:20|19)\d\d)\s*(.*?)(?:\s+(LT)[. ])?(\d[\d,]*)
See the regex demo. Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
\s* - zero or more whitespaces
((?:20|19)\d\d) - Group 2: 20 or 19 and then two digits
\s* - zero or more whitespaces
(.*?) - Group 3: any zero or more chars other than line break chars as few as possible
(?:\s+(LT)[. ])? - an optional non-capturing group matching one or more whitespaces and then capturing into Group 4 LT and then a space or .
(\d[\d,]*) - Group 5: a digit and then zero or more digits or commas.

Related

Regex: match after capture group 1 match, using it's result

I'm having trouble understanding what I did wrong in the regex below:
/^([^#]+)#(\1.*?(?=\||$))/gi
It correctly matches foo#foo,xyz,asd123|bar,asd123,xyz, with the desired result being foo,xyz,asd123.
But it does not match bar#foo,xyz,asd123|bar,asd123,xyz. Expected output would be bar,asd123,xyz.
Basically, I need to use the result of the capture group 1 to search further in the string after the # character. However, it's only working for the match immediately after # and nothing else. I feel like I'm missing a very basic thing here.
regexr.com/6ussf

You can use
^([^#]+)#.*?(\1.*?)(?=\||$)
Details
^ - start of string
([^#]+) - Group 1 (\1): one or more chars other than #
# - a # char
.*? - zero or more chars other than line break chars as few as possible
(\1.*?) - Group 2: same value as captured into Group 1 and then any zero or more chars other than line break chars as few as possible
(?=\||$) - a positive lookahead that requires | or end of string immediately to the right of the current location.

include searched regex text also in output

I'm using regex re.findall(r"[0-9]+(.*?)\.\s(.*?)[0-9]+", text) to get below text
8 EXT./INT. MONORAIL - MORNING 8
9 EXT. CITY SCAPE/MONORAIL - CONTINUOUS 9
But my current output doesn't have the prefix and suffix numbers. I'm trying to have the prefix digits also in the output as follows.
9 EXT. CITY SCAPE/MONORAIL - CONTINUOUS
Any help greatly appreciated! Thanks in advance.
(The current output is given below)

You can use
(?m)^([0-9]+)\s*(.*?)\.\s(.*?)(?:\s*([0-9]+))?$
See the regex demo. *Details:
(?m) - a multiline modifier
^ - start of string
([0-9]+) - Group 1: one or more digits
\s* - zero or more whitespaces
(.*?) - Group 2: zero or more chars other than line break chars as few as possible
\.\s - a dot and a whitespace
(.*?) - Group 3: zero or more chars other than line break chars as few as possible
(?:\s*([0-9]+))? - an optional occurrence of zero or more whitespaces and then Group 4 capturing one or more digits
$ - end of line.

Using regex replacement in Sublime 3

I am trying to use replace in Sublime using regular expressions but I'm stuck. I tried various combinations but don't seem to be getting there.
This is the input and my desired output:
Input: N_BBP_c_46137_n
Output : BBP
I tried combinations of:
[^BBP]+\b
\*BBP*+\g
But none of the above (and many others) don't seem to work.

To turn N_BBP_c_46137_n into BBP and according to the comment just want that entire long name such as N_BBP_ to be replaced by only BBP* you might also use a capture group to keep BBP.
\bN_(BBP)_\S*
\bN_ Match N preceded by a word boundary
(BBP) Capture group 1, match BBP (or use [A-Z]+ to match 1+ uppercase chars)
_\S* Match _ followed by 0+ times a non whitespace char
In the replacement use the first capturing group $1
Regex demo

You may use
(N_)[^_]*(_c_\d+_n)
Replace with ${1}some new value$2.
Details
(N_) - Group 1 ($1 or ${1} if the next char is a digit): N_
[^_]* - any 0 or more chars other than _
-(_c_\d+_n) - Group 2 ($2): _c_, 1 or more digits and then _n.
See the regex demo.

REGEXP_REPLACE for exact regex pattern, not working

I'm trying to match an exact pattern to do some data cleanup for ISSN's using the code below:
select case when REGEXP_REPLACE('1234-5678 ÿþT(zlsd?k+j''fh{l}x[a]j).,~!##$%^&*()_+{}|:<>?`"\;''/-', '([0-9]{4}[\-]?[Xx0-9]{4})(.*)', '$1') not similar to '[0-9]{4}[\-]?[Xx0-9]{4}' then 'NOT' else 'YES' end
The pattern I want match any 8 digit group with a possible dash in the middle and possible X at the end.
The code above works for most cases, but if capture group 1 is the following example: 123456789 then it also returns positive because it matches the first 8 digits, and I don't want it to.
I tried surrounding capture group 1 with ^...$ but that doesn't work either.
So I would like to match exactly these examples and similar ones:
1234-5678
1234-567X
12345678
1234567X
BUT NOT THESE (and similar):
1234567899
1234567899x
What am I missing?

You may use
^([0-9]{4}-?[Xx0-9]{4})([^0-9].*)?$
See the regex demo
Details
^ - start of string
([0-9]{4}-?[Xx0-9]{4}) - Capturing group 1 ($1): four digits, an optional -, and then four x / X or digits
([^0-9].*)? - an optional Capturing group 2: any char other than a digit and then any 0+ chars as many as possible
$ - end of string.

need an if else for regex

I have this regex to extract the name of a chatter in my iRC channel along with date and message capture groups
^\[(?:\d+)\-(?:\d+)(?:\-\d+) # (\d+):\d+(?::\d+).\d+ (?:GMT|BST)\] (([^:]+)|\[[^\]]): ((?!\!).*)
it works on this chat line, it will work to give me 'bearwolf3' which is what I want as the 2nd capture group
[04-04-2017 # 12:45:39.204 BST] bearwolf3: Break Fast
But if this line shows, I want to be able to extract a name of 'bladey2k14' from a relayed IRC message from my bot if it contains [ and ]
[04-04-2017 # 12:45:22.338 BST] loonycrewbot: [bladey2k14]: tyt romani :)
so the 2nd capture would be 'bladey2k14'
I've seen if/then/else examples but it is not working for me to use and making my brain hurt!
can anyone modify my regex at the top to do this?
you can see it here . I want match 2 to have group 2 as bladey2k14 and group 3 as the message 'tyt romani'

You may try using the following expression:
^\[\d+-\d+-\d+ # (\d+):\d+:\d+\.\d+ (?:GMT|BST)\] (?|([^:]+)(?!:\s*\[[^\]]*])|[^:]+:\s*\[([^\]]*)]): ([\w\s]*)
See the regex demo
The branch reset group (?|...|...) in a PCRE regex allows using different groups inside it with the same numbering offset. So, (?|([^:]+)(?!:\s*\[[^\]]*])|[^:]+:\s*\[([^\]]*)]) will match ([^:]+) and ([^\]]*) will capture the values into Group 2.
I also removed unnecessary non-capturing groups (like in (?:\d+) - the groups are neither quantified, nor do they contain any alternation operators).
The parts I changed are (?|([^:]+)(?!:\s*\[[^\]]*])|[^:]+:\s*\[([^\]]*)]) and [\w\s]*:
(?|([^:]+)(?!:\s*\[[^\]]*])|[^:]+:\s*\[([^\]]*)]) matches 1 of 2 alternatives:
([^:]+)(?!:\s*\[[^\]]*]): 1 or more chars other than : captured into Group 2 (with ([^:]+)) not followed with :, 0+ whitespaces, [, 0+ chars other than ] and ] (with the negative lookahead (?!:\s*\[[^\]]*]))
| - or
[^:]+:\s*\[([^\]]*)] - 1+ chars other than :, followed with :, 0+ whitespaces, [, 0+ chars other than ] captured into (again) Group 2, and then ].
The [\w\s]* matches 0+ chars that are letters/digits/_/whitespace.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to built a regexp to match optional patterns - regex

Related

Regex: match after capture group 1 match, using it's result

include searched regex text also in output

Using regex replacement in Sublime 3

REGEXP_REPLACE for exact regex pattern, not working

need an if else for regex

Categories

Resources