Regex Capture one word OR two words in quotes

Regex Capture one word OR two words in quotes - regex

I'm trying to implement gmail style filters in my search and I'm stuck at this regex problem. I need to capture ONE word OR two words in quotes (but without the quotation marks themselves) This is PCRE (PHP)
ie.
name:mark
desired result: 1st capture group should be mark
name:"mark"
desired result: 1st capture group should be mark
name:"mark wilson"
desired result: 1st capture group should be mark, second capture group should be wilson
name:mark wilson
desired result: 1st capture group should be mark, wilson is ignored
The closest I've gotten is name:(\w+|\"\w+(?>\"|\s([a-z.'-]+\"))) it captures example 1 perfectly, but example 2 still includes the quotes, and example 3 ends up as:
group 1: "mark wilson" (quotes included)
group 2: wilson" (quote included)
I've tried lookahead and lookbehinds but I'm not getting anywhere with those either
any help would be very appreciated. tia

1 option could be using an if/else clause which will give mark in group 2 and wilson in group 3. The first group will capture the " which can be used for the if else checking for the existence for group 1.
\w+:(")?(\w+(?:\h+(\w+))?)(?(1)")
Regex demo
If the space after the first name should not be there, you could also group that and have the values in group 3 and 4
\w+:(")?((\w+)(?:\h+(\w+))?)(?(1)")
Regex demo
You could also get either the single value between quotes or not, or capture the first or second name in a capturing group using a branch reset group
\w+:(?|"(\w+)(?:\h+(\w+))?"|(\w+))
Explanation
\w+: Match 1+ word chars
(?| Branch reset group
"(\w+) Capture group 1, match 1+ word chars
(?: Non capture group
\h+ match 1+ horizontal whitespace chars
(\w+) Capture group 2, match 1+ word chars
)? Close group and make optional
" Match "
| Or
(\w+) Capture group 1, match 1+ word chars
) Close branch reset group
Regex demo

The main point is that you cannot do that for arbitrary amount of groups, you must specify them all in the pattern at design time.
You may use a pattern like this with a branch reset group:
\w+:(?|(\w+)|"(\w+)(?:\h+(\w+))?(?:\h+(\w+))?")
See the regex demo. Add more (?:\h+(\w+))? patterns at the end to support up to N amount of optional words.
Details
\w+: - 1+ word chars and then a :
(?|(\w+)|"(\w+)(?:\h+(\w+))?(?:\h+(\w+))?") - a branch reset group where groups share the same IDs:
(\w+) - Group 1: one or more word chars
| - or
"(\w+)(?:\h+(\w+))?(?:\h+(\w+))?" -
" - a " char
(\w+) - Group 1: one or more word chars
(?:\h+(\w+))? - an optional occurrence of a sequences:
\h+ - 1 or more horizontal whitespaces
(\w+) - Group 2: one or more word chars
(?:\h+(\w+))?" - ibid, but with Group 3, etc.

Related

Problem with optional non-capturing group in Regex

I am trying to do regex parsing and matching and optionally discard the rest of the string.
My strings are of type:
[GROUP 1][delimiter 1][GROUP 2][delimiter 2][GROUP 3][delimiter 3 - optional][REST OF THE STRING - optional]
For example:
07. Neospace - Into The Night (Chris Van Buren)
13. Atomic Space Orchestra - Starfleet
I am trying to capture GROUP 1, GROUP 2 and GROUP 3 while ignoring REST OF THE STRING
The following regex works well if [delimiter 3] is present:
(\d+)\. (.*) - (.*)(?: \()
I am getting "07", "Neospace" and "Into The Night".
But for the second string, there is no match, because my last non-capturing group is mandatory.
When I'm trying to make last group optional like this:
(\d+)\. (.*) - (.*)(?: \()? non-capturing group stops working and I am getting "Into The Night (Chris Van Buren)" for the GROUP 3 - which is NOT what I want.

If the 3rd group has ( as a delimiter, you can use a negated character class to exclude matching a ( char.
Note that using * as a quantifier can also match an empty string between the delimiters.
If the match should be at the start of the string, you can prepend the pattern with ^
(\d+)\. (.*?) - ([^(\n]*)
Explanation
(\d+) Capture group 1, match 1+ digits
\. Match .
(.*?) Capture group 2, match 0+ times any character, as few as possible
- Match literally
([^(\n]*) Capture group 3, match 0+ times any character except ( or a newline
See a regex demo.

Regex lookaround to find anything up to an already searched group

I'm trying to analyze search queries of a particular pattern.
The pattern is:
How many/much _____ is/are _____.
Given this pattern, the blanks are unknown to me but I want to extract any statement that follows this pattern above.
My challenge is finding a way to do a lookaround on is/are up to but not including many/much and anything after but not including is/are.
Here's my regex so far:
(([hH]ow many?)|([hH]ow much?))|(?<=is)|(are)|(i|s|n|a|o|f){1,2}|((\")|(\“)|(\/)|(\'))

If you use this regex with the i flag to match case insensitive
^how\s+(?:much|many)\s+(.*?)\s(?:is|are)\s+(.*?)[.?]?$
Then it'll match these strings
How much bla is blabla.
How many bla are blablabla?
And the bla's will be in capture group 1 and 2.

Try this:
/(?<=[Hh]ow\smany\s|[Hh]ow\smuch\s)(.+)(?=\sis|\sare)|(?<=is\s|are\s)(.+)/g
Review it at regex101
Lookarounds are placed behind and/or ahead of your capture group:
1st Capture Group
(?<=[Hh]ow\smany\s|[Hh]ow\smuch\s) /* "(H|h)ow"\space"many"\space OR
"(H|h)ow"\space"much"\space
must be before capture group */
(.+) /* capture group one or more of anything */
(?=\sis|\sare) /* \space"is" OR \space"are" must be after capture group */
| // OR
2nd Capture Group
(?<=is\s|are\s) /* "is"\space OR "are\space must be before capture group */
(.+) /* capture group one or more of anything */

If both parts for the _____ are mandatory you could use 2 capture groups:
\b[Hh]ow\s+(?:much|many)\s+(\S.*?)\s+(?:is|are)\s+([^.]+)\.
The pattern matches:
\b A word boundary to prevent a partial word match
[Hh]ow\s+ Match How or how and 1+ whitespace chars
(?:much|many)\s+ Match either much or many and 1+ whitespace chars
(\S.*?)\s+ Capture in group 1 at least a single non whitespace char, then as least as possible chars and match 1+ whitespace chars
(?:is|are)\s+ Match either is or are and 1+ whitespace chars
([^.]+) Capture in group 2 1+ chars other than a dot
\. Match a dot
See a regex demo.

Regex - add a zero after second period

I have the following example of numbers, and I need to add a zero after the second period (.).
1.01.1
1.01.2
1.01.3
1.02.1
I would like them to be:
1.01.01
1.01.02
1.01.03
1.02.01
I have the following so far:
Search:
^([^.])(?:[^.]*\.){2}([^.].*)
Substitution:
0\1
but this returns:
01 only.
I need the 1.01. to be captured in a group as well, but now I'm getting confuddled.
Does anyone know what I am missing?
Thanks!!

You may try this regex replacement with 2 capture groups:
Search:
^(\d+\.\d+)\.([1-9])
Replacement:
\1.0\2
RegEx Demo
RegEx Details:
^: Start
(\d+\.\d+): Match 1+ digits + dot followed by 1+ digits in capture group #1
\.: Match a dot
([1-9]): Match digits 1-9 in capture group #2 (this is to avoid putting 0 before already existing 0)
Replacement: \1.0\2 inserts 0 just before capture group #2

You could try:
^([^.]*\.){2}\K
Replace with 0. See an online demo
^ - Start line anchor.
([^.]*\.){2} - Negated character 0+ times (greedy) followed by a literal dot, matched twice.
\K - Reset starting point of reported match.
EDIT:
Or/And if \K meta escape isn't supported, than see if the following does work:
^((?:[^.]*\.){2})
Replace with ${1}0. See the online demo
^ - Start line anchor.
( - Open 1st capture group;
(?: - Open non-capture group;
`Negated character 0+ times (greedy) followed by a literal dot.
){2} - Close non-capture group and match twice.
) - Close capture group.

Using your pattern, you can use 2 capture groups and prepend the second group with a dot in the replacement like for example \g<1>0\g<2> or ${1}0${2} or $10$2 depending on the language.
^((?:[^.]*\.){2})([^.])
^ Start of string
((?:[^.]*\.){2}) Capture group 1, match 2 times any char except a dot, then match the dot
([^.].*) Capture group 2, match any char except a dot
Regex demo
A more specific pattern could be matching the digits
^(\d+\.\d+\.)(\d)
^ Start of string
(\d+\.\d+\.) Capture group 1, match 2 times 1+ digits and a dot
(\d) Capture group 2, match a digit
Regex demo
For example in JavaScript
const regex = /^(\d+\.\d+\.)(\d)/;
[
"1.01.1",
"1.01.2",
"1.01.3",
"1.02.1",
].forEach(s => console.log(s.replace(regex, "$10$2")));

Obviously, there will be tons of solutions for this, but if this pattern holds (i.e. always the trailing group that is a single digit)... \.(\d)$ => \.0\1 would suffice - to merely insert a 0, you don't need to match the whole thing, only just enough context to uniquely identify the places targeted. In this case, finding all lines ending in a . followed by a single digit is enough.

How to regex string that can end with a number and group each part

I have the following test strings:
Battery Bank 1
Dummy 32 Segment 12
System
Modbus 192.168.0.1 Group
I need a regex that can match and group these as follows:
Group 1: Battery Bank
Group 2: 1
Group 1: Dummy 32 Segment
Group 2: 12
Group 1: System
Group 2: null
Group 1: Modbus 192.168.0.1 Group
Group 2: null
Basically, capture everything (including numbers) into group 1 unless the string ends with a whitespace followed by 1 or more digits. If it does, capture this number into group 2.
This regex is not doing what I need as everything is captured into the first group.
([\w ]+)( \d+)?
https://regex101.com/r/GEtb5G/1/

Basically, capture everything (including numbers) into group 1 unless the string ends with a whitespace followed by 1 or more digits. If it does, capture this number into group 2.
You may use this group that allows an empty match in 2nd capture group:
^(.+?) *(\d+|)$
Updated RegEx Demo
RegEx Details:
^: Start
(.+?): Match 1+ of any character (lazy) in capture group #1
*: Match 0 or more spaces
(\d+|): Match 1+ digits or nothing in 2nd capture group
$: End

You can use
^\s*(.*[^\d\s])(?:\s*(\d+))?\s*$
See the regex demo (note \s are replaced with spaces since the test string in the demo is a single multiline string).
If the regex is to be used with a multiline flag to match lines in a longer multiline text, you can use
^[^\S\r\n]*(.*[^\d\s])(?:[^\S\r\n]*(\d+))?[^\S\r\n]*$
See the regex demo.
Details:
^ - start of a string
\s* - zero or more whitespaces
(.*[^\d\s]) - Group 1: any zero or more chars other than line break chars as many as possible and then a char other than a digit and whitespace
(?:\s*(\d+))? - an optional sequence of
\s* - zero or more whitespaces
(\d+) - Group 2: one or more digits
\s* - zero or more whitespaces
$ - end of string.
In the second regex, [^\S\r\n]* matches any zero or more whitespaces other than LF and CR chars.

Regex to Capture rest of the line

I have a regex that captures the following expression
XPT 123A
Now I need to add "something" to my regex to capture the remaining string as a group
XPT 123A I AM VERY HAPPY
So XPT would be group 1, 123A group 2, and I AM VERY HAPPY group 3.
Here is my regex (also here http://regexr.com/4mocf):
^([A-Z]{2,4}).((?=\d)[a-zA-Z\d]{0,4})
EDIT:
I dont want to name my groups (editing b/c some people thought it was a dup of another question)

Assuming Group 3 is optional, you may use
^([A-Z]{2,4}) (\d[a-zA-Z\d]{0,3})(?: (.*))?$
^([A-Z]{2,4})\s+(\d[a-zA-Z\d]{0,3})(?:\s+(.*))?$
The \s+ matches any 1+ whitespace chars.
See the regex demo.
Details
^ - start of string
([A-Z]{2,4}) - Group 1: two, three or four uppercase ASCII letters
\s+ - 1+ whitespaces
(\d[a-zA-Z\d]{0,3}) - Group 2: a digit followed with 0 or more alphanumeric chars
(?:\s+(.*))? - an optional non-capturing group matching 1 or 0 occurrences of:
\s+ - 1+ whitespaces
(.*) - Group 3: any 0+ chars other than line break chars as many as possible
$ - end of string

Just add the following suffix to your regex to capture the rest of the line:
(?<rest>.+)?$

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex Capture one word OR two words in quotes - regex

Related

Problem with optional non-capturing group in Regex

Regex lookaround to find anything up to an already searched group

Regex - add a zero after second period

How to regex string that can end with a number and group each part

Regex to Capture rest of the line

Categories

Resources