How to match a group of value to group 1 - regex

Was tying to solve a regex question posted in SO, but was stuck with this.
From this string
Ob=Web technology,OB=Product SPe,OB=Dev profile,OB=Computer Management,oB=Hardware Services,cd=sti,CD=com,cd=ws
The values has to be removed as below.
Web technology,Product SPe,Dev profile,Computer Management,Hardware Services
I was trying the below regex.
(?=Ob)(?:(\w+=)([\w\s]+,?))+
My assumption was that group 1 should have all keys and group 2 should have all the values. But all except the last key value pair all others are getting assigned to group 0.
Is there a way go getting all values to group 2 ?
And here is what I was working on.

The issue with your regex is that group 1 and group 2 are enclosed within a non-capturing group. This caused the entire regex to get captured with group 0. And the other thing is the the positive-lookahead prevented the regex to do a global match.
Below regex will gather all keys to group to group 1 and values to group 2.
(\w+)=([\w\s]+)(?=[,\s]+)
Check it out how it works here.

,?cd=.*?(?:,|$)|ob=
Try this.Replace by empty string.See demo.Do not forget flag i.
http://regex101.com/r/lZ5mN8/59
or
cd=.*?(?:,|$)|[^=,]+=(.*?)(?=,|$)
Try this.Replace by $1.See demo.
http://regex101.com/r/lZ5mN8/57

REgex:
(?i)Ob=([^,]+)|(?!.*\bob\b).+
Replacement string:
$1
DEMO
(?i) Will do a case insensitive match.
Ob=([^,]+) Group index 1 contains all the Ob values.
| OR
(?!.*\bob\b).+ Match any character one or more times but it won't contain \bob\b

This regex should work for you:
^(?!Ob=).*(*SKIP)(*F)|(\w+)=(\w+(?=,|$))
You can see that you're getting all keys in group #1 and all values in group #2.
RegEx Demo

Related

Reusing branch reset group doesn't match all the alternatives

I am trying to validate an IPv4 address using the RegEx below
^((?|([0-9][0-9]?)|(1[0-9][0-9])|(2[0-5][0-5]))\.){3}(?2)$
The regex works fine until the 3rd octet of the IP address in most of the cases. But sometimes in the last octet, it only matches the first alternative in the Branch Reset Group and ignores the other alternating groups altogether. I know that all alternatives in a branch reset group refer to the same capturing group. I tried the suggestion to reuse the capture groups as described in this StackOverflow post. It worked partially.
There is an explanation about this behaviour on this page:
https://www.pcre.org/original/doc/html/pcrepattern.html#SEC15
The documentation states:
a subroutine call to a numbered subpattern always refers to the first
one in the pattern with the given number.
Using the example on that page:
(?|(abc)|(def))(?1)
Inside a (?| group, parentheses are numbered as usual, but the number
is reset at the start of each branch.
The numbers will look like this
(?|(abc)|(def))
1 1
This will match
abcabc
defabc
abcabc
But it does not match
defdef
It does not match defdef because the pattern will match the first def, but the following (?1) will only match the first numbered subpattern which is (abc)
See a regex demo.
The reason is that (?2) regex subroutine recurses the first capturing group pattern with the ID 2, ([0-9][0-9]?). If it fails to match (the $ requires the end of string right after it), backtracking starts and the match is eventually failed.
The correct approach to recurse a group of patterns is to avoid using a branch reset group and capture all alternatives into a single capturing group that will be recursed:
^(?:(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?1)$
// |____________ Group 1 _______________| \_ Regex subroutine
See the regex demo.
Note the octet pattern is a bit different, it is taken from How to Find or Validate an IP Address. Your octet pattern is wrong because 2[0-5][0-5] does not match numbers between 200 and 255 that end with 6, 7, 8 and 9.

Remove whitespace from within a regex capturing group

This should be a simple job, but this morning I just can't seem to find the answer I need
Value:
N.123456 7
Current regex
N.(\d{6}\s?\d)
Returns single matching group
123456 7
Want it to return single matching group
1234567
Thanks
You can't return as single matching group. I think what are you looking for is non-capturing group (?: ).
There is explanation here
Maybe this regex would help you. It will exclude space character with non capturing group.
N.(\d{6})(?:\s?)(\d)
It will capture 123456 in group 1 and 7 in group 2.
What you want is probably this. It will return 1234567
"N.123456 7".replaceAll("N.(\\d{6})(?:\\s?)(\\d)", "$1$2")
Try this:
(?<=\d)\s+(?=\d+)
This should work.

regex group not matching

I have the regex
(\d|(IV|I{0,3})|\bone\b|\btwo\b|\bthree\b|\bfour\b)[\w\s]+
if I use the sentence
'1 has wound' - 1 is matched in group 1 as expected
'IV has wound' - IV is matched in group 1 as expected
but, the sentence
'one has wound' - the word one doesn't get matched in group 1
when i modify the regex as follows
(\bone\b|\btwo\b|\bthree\b|\bfour\b|\d|(IV|I{0,3}))[\w\s]+
the group matches as expected.
So, my question why does changing the order of the group work..
I tried looking up ordering and precedence for regex but couldn't find anything relevant..
Thx
I think you made a mistake in your regex, it should be
(\d|(IV|I{1,3})|\bone\b|\btwo\b|\bthree\b|\bfour\b)[\w\s
Notice it's I{1,3}, not I{0,3}.
So, because of that, your regex match zero I, thus the empty capture group 1

regex for specific fileName patterns

Hi I had situtation with frame02_0046.tiff for which I figured out the following regex ^(.*)(\d+)([^\d]*)$ however I have another pattern of names frame1.03.png to frame5.03.png
how can in the regex I can include both the name pattern
pattern = '^(.*)(\d+)([^\d]*)$'
patternExpr = re.compile(pattern)
As posted in the comment, your regex is "bad" for the purpose you have.
You can use this regex instead:
^([a-z]*?)(\d+)[_.](\d+)\.[a-z]+
Working demo
Note the flags I used in the above link.
Match information
Match 1
Full match 0-17 `frame02_0046.tiff`
Group 1. 0-5 `frame`
Group 2. 5-7 `02`
Group 3. 8-12 `0046`
Match 2
Full match 18-31 `frame1.03.png`
Group 1. 18-23 `frame`
Group 2. 23-24 `1`
Group 3. 25-27 `03`
Match 3
Full match 32-45 `frame5.03.png`
Group 1. 32-37 `frame`
Group 2. 37-38 `5`
Group 3. 39-41 `03`

Regex to capture words and numbers in separate groups

I need two groups - one to extract words, second - numbers. Example:
['| Sofia | 300']
need to extract:
Group 1 - Sofia; Group 2 - 300
My regex attempt:
([a-zA-Z]+[ ]*[a-zA-Z]+)([0-9]+)
I don't understand as to why this doesn't match. I've been reading for 30 minutes now and maybe I can't phrase my issue correctly, but I can't find solution. My thinking here is that each set of parentheses holds a group. The Regex inside them seems to work fine on its own, but when I try to capture 2 groups - it fails. Obviously I am missing something important about multiple group capturing.
It doesn't match because you're not matching the characters between "Sofia" and "300". This would match "Sofia300", but not "Sofia 300" or "Sofia | 300". Try this:
(\w+ *\w+).*?(\d+)
(I'm using \w instead of [a-zA-Z] and \d instead of [0-9] for brevity.)
The following will give you your groups:
/([a-z]+).*\|\s([0-9]+)/i
Example