Regex : how to optional capture a group

Regex : how to optional capture a group - regex

I'm trying to make an substring optional.
Here is the source :
Movie TOTO S09 E22 2022 Copyright
I want to optionally capture the substring : S09 E22
What I have tried so far :
/(Movie)(.*)(S\d\d\s*E\d\d)?/gmi
The problem is that it ends up by matching S09 E22 2022 Copyright instead of just S09 E22 :
Match 1 : 0-33 Movie TOTO S09 E22 2022 Copyright
Group 1 : 0-5 Movie
Group 2: 5-33 TOTO S09 E22 2022 Copyright
Is there anyway to fix this issue ?
Regards

You get that match because the .* is greedy and will first match until the end of the string.
Then your (S\d\d\s*E\d\d)? is optional so this will stay matched and does not backtrack.
If you don't want partial matches for S09 or E22 and the 4 digits for the year are not mandatory and you have movies longer than 1 word, with pcre you could use:
\b(Movie)\b\h+((?:(?!\h+[SE]\d+\b).)*)(?:\h(S\d+\h+E\d+))?
\b(Movie)\b Capture the word Movie
( Capture group
(?: Non capture group to repeat as a whole part
(?!\h+[SE]\d+\b). Match any character if either the S01 or E22 part is not directly to the right (where [SE] matches either a S or E char, and \h matches a horizontal whitespace char)
)* Close the non capture group and optionall repeat it
) Close capture group
(?:\h(S\d+\h+E\d+)) Optionally capture the S01 E22 part (where \d+ matches 1 or more digits)
Regex demo
Another option with a capture group for the S01 E22 part, or else match the redt of the line
\b(Movie)\h+([^S\n]*(?:S(?!\d+\h+E\d+\b)[^S\n]*)*+)(S\d+\h+E\d+)?
Regex demo

With your shown samples and attempts please try following regex.
^Movie\s+\S+\s+(S\d{2}\s+E\d{2}(?=\s+\d{4}))
Here is the Online Demo for used regex.
Explanation: Adding detailed explanation for used regex above.
^Movie\s+\S+\s+ ##Matching string Movie from starting of value followed by spaces non-spaces and spaces.
(S\d{2}\s+E\d{2} ##Creating one and only capturing group where matching:
##S followed by 2 digits followed by spaces followed by E and 2 digits.
(?=\s+\d{4}) ##Making sure by positive lookahead that previous regex is followed by spaces and 4 digits.
) ##Closing capturing group here.

An idea to make the dot lazy .*? and force it to match up to $ end if other part doesn't exist.
Movie\s*(.*?)\s*(S\d\d\s*E\d\d|$)
See this demo at regex101 (further I added some \s* spaces around captures)

There are several errors in your regex:
Blank space after Movie is not considered.
(.*) matches everything after Movie.
Try online at https://regex101.com/
(Movie\s*)(\w*\s*)(S\d{2}\s*E\d{2}\s*)?((?:\w*\s*)*)

Related

moving characters by using regex

I'm trying to move matched characters to the end of sentence.
from
300p apple in house
orange 200p in school
to
apple in house 300p
orange in school 200p
So I matched (.+)([\d]+p)(.+)$ and substituted with \1 \3 \2.
But the result is like
30 apple in house 0p
orange 20 in school 0p
I also checked greedy concept, but I don't know what is problem. How can I fix this?

You can use
^(.*?)(\d+p) *(.+)
Replace with \1\3 \2.
See the regex demo. Details:
^ - start of string (or line if you use a multiline mode)
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
(\d+p) - Group 2: one or more digits, and then a p char
* - zero or more spaces
(.+) - Group 3: any one or more chars other than line break chars as many as possible (since it is a greedy subpattern, no $ anchor is required, the match will go up to the end of string (or line if you use a multiline mode)).

With your shown samples only, please try following regex.
^(\D+)?(\d+p)\s*(.+)$
Online demo for above regex
Explanation:
^(\D+)? ##Matching from starting and creating 1st capturing group which has all non-digits in it and keeping it as optional.
(\d+p) ##Creating 2nd capturing group which matches 1 or more digits followed by p here.
\s* ##Matching 0 or more occurrences of spaces here.
(.+)$ ##Creating 3rd capturing group here which has everything in it.

Select comma by comma keywords with REGEX

Hello folks I have a line like that in my file.
> **Keywords** : test, test2, test3
And I need to select keyword by keyword and all array with regex.
NOTE: That test elements can be more than 3
Group 1 : test, test2, test3
Group 2 : test
Group 3 : test2
Group 4 : test3
I try to write that regex but it's not repeated for all commas :(
/^(> \*\*Keywords\*\* : ),?([\w]+)/gmi
This is the test env : https://regex101.com/r/UHLrX1/2
How can I handle that regex?

In Javascript, you may use this regex with a lookbehind assertion:
(?<=(^> \*\*Keywords\*\* : )(?:\w+, )*)(\w+)
RegEx Demo
RegEx Details:
(?<=: Start positive lookbehind
(^> \*\*Keywords\*\* : ): Match > \*\*Keywords\*\* : and capture it in group #1
(?:\w+, )*: Followed by 0 or more comma separated words
): End positive lookbehind
(\w+): Match 1+ character word in capture group #2

EDIT: In case you want to capture more than 3 elements as per shown samples then one could try following regex:
^\*\*Keywords\*\*.*?:\s+((?:(?:[^,]*),\s+){1,}(?:.*))$
Online demo for above regex
With your shown samples, please try following regex.
^\*\*Keywords\*\*.*?:\s+(([^,]*),\s+([^,]*),\s+(.*))$
Online demo for above regex
Explanation: Adding detailed explanation for above.
^^\*\*Keywords\*\*.*?:\s+ ##From starting of value matching till colon followed by spaces(1 or more occurrences)
( ##Starting 1st capturing group here.
([^,]*) ##In 2nd capturing group matching everything till comma comes.
,\s+ ##Matching comma followed by spaces 1 or more occurrences.
([^,]*) ##In 3rd capturing group matching everything till comma comes.
,\s+ ##Matching comma followed by spaces 1 or more occurrences.
(.*) ##In 4th capturing group matching everything till comma comes.
)$ ##Closing 1st capturing group till end of value.

Regex - add a zero after second period

I have the following example of numbers, and I need to add a zero after the second period (.).
1.01.1
1.01.2
1.01.3
1.02.1
I would like them to be:
1.01.01
1.01.02
1.01.03
1.02.01
I have the following so far:
Search:
^([^.])(?:[^.]*\.){2}([^.].*)
Substitution:
0\1
but this returns:
01 only.
I need the 1.01. to be captured in a group as well, but now I'm getting confuddled.
Does anyone know what I am missing?
Thanks!!

You may try this regex replacement with 2 capture groups:
Search:
^(\d+\.\d+)\.([1-9])
Replacement:
\1.0\2
RegEx Demo
RegEx Details:
^: Start
(\d+\.\d+): Match 1+ digits + dot followed by 1+ digits in capture group #1
\.: Match a dot
([1-9]): Match digits 1-9 in capture group #2 (this is to avoid putting 0 before already existing 0)
Replacement: \1.0\2 inserts 0 just before capture group #2

You could try:
^([^.]*\.){2}\K
Replace with 0. See an online demo
^ - Start line anchor.
([^.]*\.){2} - Negated character 0+ times (greedy) followed by a literal dot, matched twice.
\K - Reset starting point of reported match.
EDIT:
Or/And if \K meta escape isn't supported, than see if the following does work:
^((?:[^.]*\.){2})
Replace with ${1}0. See the online demo
^ - Start line anchor.
( - Open 1st capture group;
(?: - Open non-capture group;
`Negated character 0+ times (greedy) followed by a literal dot.
){2} - Close non-capture group and match twice.
) - Close capture group.

Using your pattern, you can use 2 capture groups and prepend the second group with a dot in the replacement like for example \g<1>0\g<2> or ${1}0${2} or $10$2 depending on the language.
^((?:[^.]*\.){2})([^.])
^ Start of string
((?:[^.]*\.){2}) Capture group 1, match 2 times any char except a dot, then match the dot
([^.].*) Capture group 2, match any char except a dot
Regex demo
A more specific pattern could be matching the digits
^(\d+\.\d+\.)(\d)
^ Start of string
(\d+\.\d+\.) Capture group 1, match 2 times 1+ digits and a dot
(\d) Capture group 2, match a digit
Regex demo
For example in JavaScript
const regex = /^(\d+\.\d+\.)(\d)/;
[
"1.01.1",
"1.01.2",
"1.01.3",
"1.02.1",
].forEach(s => console.log(s.replace(regex, "$10$2")));

Obviously, there will be tons of solutions for this, but if this pattern holds (i.e. always the trailing group that is a single digit)... \.(\d)$ => \.0\1 would suffice - to merely insert a 0, you don't need to match the whole thing, only just enough context to uniquely identify the places targeted. In this case, finding all lines ending in a . followed by a single digit is enough.

Regex Capture one word OR two words in quotes

I'm trying to implement gmail style filters in my search and I'm stuck at this regex problem. I need to capture ONE word OR two words in quotes (but without the quotation marks themselves) This is PCRE (PHP)
ie.
name:mark
desired result: 1st capture group should be mark
name:"mark"
desired result: 1st capture group should be mark
name:"mark wilson"
desired result: 1st capture group should be mark, second capture group should be wilson
name:mark wilson
desired result: 1st capture group should be mark, wilson is ignored
The closest I've gotten is name:(\w+|\"\w+(?>\"|\s([a-z.'-]+\"))) it captures example 1 perfectly, but example 2 still includes the quotes, and example 3 ends up as:
group 1: "mark wilson" (quotes included)
group 2: wilson" (quote included)
I've tried lookahead and lookbehinds but I'm not getting anywhere with those either
any help would be very appreciated. tia

1 option could be using an if/else clause which will give mark in group 2 and wilson in group 3. The first group will capture the " which can be used for the if else checking for the existence for group 1.
\w+:(")?(\w+(?:\h+(\w+))?)(?(1)")
Regex demo
If the space after the first name should not be there, you could also group that and have the values in group 3 and 4
\w+:(")?((\w+)(?:\h+(\w+))?)(?(1)")
Regex demo
You could also get either the single value between quotes or not, or capture the first or second name in a capturing group using a branch reset group
\w+:(?|"(\w+)(?:\h+(\w+))?"|(\w+))
Explanation
\w+: Match 1+ word chars
(?| Branch reset group
"(\w+) Capture group 1, match 1+ word chars
(?: Non capture group
\h+ match 1+ horizontal whitespace chars
(\w+) Capture group 2, match 1+ word chars
)? Close group and make optional
" Match "
| Or
(\w+) Capture group 1, match 1+ word chars
) Close branch reset group
Regex demo

The main point is that you cannot do that for arbitrary amount of groups, you must specify them all in the pattern at design time.
You may use a pattern like this with a branch reset group:
\w+:(?|(\w+)|"(\w+)(?:\h+(\w+))?(?:\h+(\w+))?")
See the regex demo. Add more (?:\h+(\w+))? patterns at the end to support up to N amount of optional words.
Details
\w+: - 1+ word chars and then a :
(?|(\w+)|"(\w+)(?:\h+(\w+))?(?:\h+(\w+))?") - a branch reset group where groups share the same IDs:
(\w+) - Group 1: one or more word chars
| - or
"(\w+)(?:\h+(\w+))?(?:\h+(\w+))?" -
" - a " char
(\w+) - Group 1: one or more word chars
(?:\h+(\w+))? - an optional occurrence of a sequences:
\h+ - 1 or more horizontal whitespaces
(\w+) - Group 2: one or more word chars
(?:\h+(\w+))?" - ibid, but with Group 3, etc.

How can I match everything between 2 commas?

I want to match basically any text that has a comma separated list of weekdays.
(?i)(every (mon|tue|wed|thu|fri|sat|sun)[A-Za-z]{3,5}, .*+,
(mon|tue|wed|thu|fri|sat|sun)[A-Za-z]{3,5})
Above is what what I have and I want to make it match the following strings. I don't need help in the case that only 2 weekdays are supplied.
Every mon, tue, wednesday
Every wed, Saturday, Friday, sun.

Try pattern: (?<=,|^)[^,\n]+
Explanation
(?<=,|^) - positive lookbehind: assert what preceeds is comma , or beginning of the string ^
[^,\n]+ - match one or more characters other than comma , or newline \n
Demo

You might list the abbreviations and optionally match the full name by listing them using an alternation followed by a comma and a space.
Add that to a group and repeat that 0+ times. After that add the group without a comma to make sure you match at least a single day.
(?i)\bevery (?:(?:mon(?:day)?|tue(?:sday)?|wed(?:nesday)?|thu(?:rsday)?|fri(?:day)?|sat(?:urday)?|sun(?:day)?), )*(?:mon(?:day)?|tue(?:sday)?|wed(?:nesday)?|thu(?:rsday)?|fri(?:day)?|sat(?:urday)?|sun(?:day)?)\b
Explanation
(?i)\bevery Case insensitive modifier
(?: No capturing group
(?:mon(?:day)?|tue(?:sday)?|wed(?:nesday)?|thu(?:rsday)?|fri(?:day)?|sat(?:urday)?|sun(?:day)?), Match any of the listed followed by a comma and space
)* Close non capturing group and repeat 0+ times
(?: Non capturing group
mon(?:day)?|tue(?:sday)?|wed(?:nesday)?|thu(?:rsday)?|fri(?:day)?|sat(?:urday)?|sun(?:day)? Match any of the listed
)\b Close non capturing group and add a word boundary to prevent being part of a larger word
Regex demo
To not match only multiple days, you could update the * quantifier for the first non capturing groupe to for example + or {2,}.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex : how to optional capture a group - regex

An idea to make the dot lazy .? and force it to match up to $ end if other part doesn't exist. Movie\s(.?)\s(S\d\d\sE\d\d|$) See this demo at regex101 (further I added some \s spaces around captures)

There are several errors in your regex: Blank space after Movie is not considered. (.) matches everything after Movie. Try online at https://regex101.com/ (Movie\s)(\w\s)(S\d{2}\sE\d{2}\s)?((?:\w\s)*)

Related

moving characters by using regex

Select comma by comma keywords with REGEX

Regex - add a zero after second period

Regex Capture one word OR two words in quotes

How can I match everything between 2 commas?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex : how to optional capture a group - regex

An idea to make the dot lazy .*? and force it to match up to $ end if other part doesn't exist. Movie\s*(.*?)\s*(S\d\d\s*E\d\d|$) See this demo at regex101 (further I added some \s* spaces around captures)

There are several errors in your regex: Blank space after Movie is not considered. (.*) matches everything after Movie. Try online at https://regex101.com/ (Movie\s*)(\w*\s*)(S\d{2}\s*E\d{2}\s*)?((?:\w*\s*)*)

Related

moving characters by using regex

Select comma by comma keywords with REGEX

Regex - add a zero after second period

Regex Capture one word OR two words in quotes

How can I match everything between 2 commas?

Categories

Resources

An idea to make the dot lazy .? and force it to match up to $ end if other part doesn't exist. Movie\s(.?)\s(S\d\d\sE\d\d|$) See this demo at regex101 (further I added some \s spaces around captures)

There are several errors in your regex: Blank space after Movie is not considered. (.) matches everything after Movie. Try online at https://regex101.com/ (Movie\s)(\w\s)(S\d{2}\sE\d{2}\s)?((?:\w\s)*)