Regular expressions, group without getting to matches [duplicate] - regex

This question already has answers here:
What is a non-capturing group in regular expressions?
(18 answers)
Closed 8 years ago.
regexp like this:
/<span[^>]*class=\"link[^>]*params=\"(\d+),(\d+),[^>]*>[^<]*from.*?(\d{1,2})(.*?)(\d{4}).*?(year|Year)[^<]*<\/span>/
string like that:
<p id="p_195" class="s_16" style="text-indent:6pt;"><span class="link s_8" params="65537,21403229,0,195,0,0" onmouseover="this.style.textDecoration='underline';" onmouseout="this.style.textDecoration='none';" onclick="return onClickLink(event, this);">Sometext from 28 september 2013& nbsp;year</span></p>
The trouble is that september with . There can be a space or . I change regexp to: bla-bla-blah... from.*?(\d{1,2})**(& nbsp;|\s)**(.*?)(\d{4}).*?(year|Year) ...bla-bla-blah
(& nbsp; without a space)
So, in matches I've got a ( |\s). But I do not need em there! How to group ( |\s) without getting em to matches?

You want a non-capturing group, try this:
?(\d{1,2})(?:& nbsp;|\s)(.?)(\d{4}).*?(year|Year)
See Kobi's comment to the OP for details. What is a non-capturing group? What does a question mark followed by a colon (?:) mean?
Be careful with non-capturing groups. They are not supported in all regex flavours and can mess up your post-processing code if you you rely on the group backreference indexes and suddenly decide to change a group to be non-capturing. My advice is to always used named groups in .Net.

Related

Optional group at the end includes excess group [duplicate]

This question already has answers here:
Can I use an OR in regex without capturing what's enclosed?
(4 answers)
Closed 8 months ago.
I'm trying to make a group at the end optional. But it includes one excess result.
What the right regex could look like? Would appreciate any help here.
Regex = <!([a-z]{0,25})\|([^\|>]*)\|([^\|<>]*)(\|([^\|<>]*))?>
Example = <!user|123|Kirill|{"color":"rgb(255, 184, 75)"}> published <!content|456|A cool content>
Gives the following 1st matched group, the highlighted is the excess unexpected result:
The pipe could be outside of the last capture group, then make that whole part optional using a non capture group.
Note that you don't have to escape the pipe in a character class.
<!([a-z]{0,25})\|([^|>]*)\|([^|<>]*)(?:\|([^|<>]*))?>
Regex demo

Regexp to match multi-line string [duplicate]

This question already has answers here:
What is a non-capturing group in regular expressions?
(18 answers)
Closed 2 years ago.
I have this regexp:
^(?<FOOTER_TYPE>[ a-zA-Z0-9-]+)?(?<SEPARATOR>:)?(?<FOOTER>(?<=:)(.|[\r\n](?![\r\n]))*)?
Which I'm using to match text like:
BREAKING CHANGE: test
my multiline
string.
This is not matched
You can see the result here https://regex101.com/r/gGroPK/1
However, why is there the last Group 4 ?
You will need to make last group non-capturing:
^(?<FOOTER_TYPE>[ a-zA-Z0-9-]+)?(?<SEPARATOR>:)?(?<FOOTER>(?<=:)(?:.|[\r\n](?![\r\n]))*)?
Make note of:
(?:.|[\r\n](?![\r\n]))*)?
(?: at the start makes this optional group non-capturing.
Updated Demo
it is group 4 because the fourth parentheses you defined is:
(.|[\r\n](?![\r\n]))*)
it translate to
"either dot, or the following regex"
and in the example you have, it ends on a dot.
string.
so as regex is usually greedy, it captures dot as the forth group

Repeating pattern for a regex -- validate the same [duplicate]

This question already has answers here:
Have trouble understanding capturing groups and back references
(2 answers)
Closed 3 years ago.
The url of my username is:
https://stackoverflow.com/users/12283851/user12283851
For this username it looks like the regular expression might be close to:
r'https?://stackoverflow.com/users/\d{1,9}/user\d{1,9}'
Is there a way in the regex to make sure that the first ID matches the second? In other words:
https://stackoverflow.com/users/12283851/user12283851 <== Valid
https://stackoverflow.com/users/11111111/user12283851 <== Invalid
This is accomplished by using backreferences.
The backreference \1 (backslash one) references the first capturing group. \1 matches the exact same text that was matched by the first capturing group
In your example the following regex would work:
https?://stackoverflow\.com/users/(\d{1,9})/user\1
See this demo

REGEX - find a string that has the same match? [duplicate]

This question already has answers here:
Regex plus vs star difference? [duplicate]
(9 answers)
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 4 years ago.
I am trying to match a string "menu-item" but has a digit after it.
<li id="menu-item-578" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-578">
i can use this regex
menu-item-[0-9]*
however it matches all the menu-item string, i want to only match the "menu-item-578" but not id="menu-item-578"
how can i do it?
thank you
You should avoid using menu-item-[0-9]* not because it matches the same expected substring superfluously but for the reason that it goes beyond that too like matching menu-item- in menu-item-one.
Besides replacing quantifier with +, you have to look if preceding character is not a non-whitespace character:
(?<!\S)menu-item-[0-9]+(?=["' ])
or if your regex flavor doesn't support lookarounds you may want to do this which may not be precise either:
[ ]menu-item-[0-9]+
You may also consider following characters using a more strict pattern:
[ ]menu-item-[0-9]+["' ]
Try it works too:
(\s)(menu-item-)\d+
https://regex101.com/
\s Any whitespace character
Use a space before, like this:
\ menu-item-[0-9]*
The first ocurrence has an " right before, while the second one has a space.
EDIT: use an online regex editor (like Regex tester to try this things.

What's the difference between regex [-+]? and (-|+)? [duplicate]

This question already has answers here:
Using alternation or character class for single character matching?
(3 answers)
Closed 4 years ago.
What's the difference between regex
[-+]?
and
(-|+)?
Don't they mean the same?
Both match same characters. But the second form produce capturing group. You can use backreference to access the group (\1 or $1, .. according to your regular expression engine).
UPDATE
The second form is invalid in many regular expression engines. (valid for some old regular expression engine that match + match literally).
Because + has special meaning: One or more repetitions of preceding pattern, but there's nothing to repeat.
They are same but I would prefer character class (1st form) since 2nd form captures - or + which you may not need.
Even this will be equivalent without capturing the text in the group:
(?:-|+)?
Most regexes can be put in the form of alternation groups and the star operator - for example, [ab]+ can be written as (a|b)(a|b)* - but this is much more verbose, so the other operators exist. You included the question mark operator in your regex, but really [-+]? is equivalent to (+|-|)
So there really is no difference (except for capturing as others have mentioned), but that doesn't mean the other operators aren't useful in making a regex compact and intuitive to understand.