Optional group at the end includes excess group [duplicate] - regex

This question already has answers here:
Can I use an OR in regex without capturing what's enclosed?
(4 answers)
Closed 8 months ago.
I'm trying to make a group at the end optional. But it includes one excess result.
What the right regex could look like? Would appreciate any help here.
Regex = <!([a-z]{0,25})\|([^\|>]*)\|([^\|<>]*)(\|([^\|<>]*))?>
Example = <!user|123|Kirill|{"color":"rgb(255, 184, 75)"}> published <!content|456|A cool content>
Gives the following 1st matched group, the highlighted is the excess unexpected result:

The pipe could be outside of the last capture group, then make that whole part optional using a non capture group.
Note that you don't have to escape the pipe in a character class.
<!([a-z]{0,25})\|([^|>]*)\|([^|<>]*)(?:\|([^|<>]*))?>
Regex demo

Related

Regex if character matches then, else [duplicate]

This question already has answers here:
In regex, match either the end of the string or a specific character
(2 answers)
Closed 7 months ago.
I have two regular expressions that work fine to extract text between characters:
(?<=\$)(.*)(?=\*)
(?<=\$)(.*)(?=)
For my example text $66* the first expression extracts 66. When the asterisk is not present in the text (i.e. $66), the second expression extracts 66.
How can I combine the two to use the first one if an asterisk is present and the second one if no asterisk is present?
I tried with what I thought would be an if|then|else like below but am doing something wrong: (?(?=\*)(?<=\$)(.*)(?=\*)|(?<=\$)(.*)(?=))
You can use a negated character set to exclude asterisks in your match instead:
(?<=\$)[^*]+
Demo: https://regex101.com/r/vuGBiJ/2
As you are already using a capture group, you could also match the $ and capture 1+ characters except the asterix.
\$([^*]+)
Regex demo

Regex: How to avoid capturing a group if it ends with a specific character [duplicate]

This question already has answers here:
Negative lookahead not working after character range with plus quantifier
(2 answers)
Closed 7 months ago.
I need to avoid capturing a match if it ends with a colon.
Example below:
item: something
item: another
item: things:
item: yetanother
My desired result is to return nothing from the 3rd line.
I feel like I'm close with this regex using negative lookahead:
item: (\w+)(?!:)
But it's just cutting off the last letter, not avoiding the whole word.
No need for lookarounds, just specify that the line should not end with a colon:
^.*[^:]$
https://regex101.com/r/oxBsMU/1

Regexp to match multi-line string [duplicate]

This question already has answers here:
What is a non-capturing group in regular expressions?
(18 answers)
Closed 2 years ago.
I have this regexp:
^(?<FOOTER_TYPE>[ a-zA-Z0-9-]+)?(?<SEPARATOR>:)?(?<FOOTER>(?<=:)(.|[\r\n](?![\r\n]))*)?
Which I'm using to match text like:
BREAKING CHANGE: test
my multiline
string.
This is not matched
You can see the result here https://regex101.com/r/gGroPK/1
However, why is there the last Group 4 ?
You will need to make last group non-capturing:
^(?<FOOTER_TYPE>[ a-zA-Z0-9-]+)?(?<SEPARATOR>:)?(?<FOOTER>(?<=:)(?:.|[\r\n](?![\r\n]))*)?
Make note of:
(?:.|[\r\n](?![\r\n]))*)?
(?: at the start makes this optional group non-capturing.
Updated Demo
it is group 4 because the fourth parentheses you defined is:
(.|[\r\n](?![\r\n]))*)
it translate to
"either dot, or the following regex"
and in the example you have, it ends on a dot.
string.
so as regex is usually greedy, it captures dot as the forth group

Repeating pattern for a regex -- validate the same [duplicate]

This question already has answers here:
Have trouble understanding capturing groups and back references
(2 answers)
Closed 3 years ago.
The url of my username is:
https://stackoverflow.com/users/12283851/user12283851
For this username it looks like the regular expression might be close to:
r'https?://stackoverflow.com/users/\d{1,9}/user\d{1,9}'
Is there a way in the regex to make sure that the first ID matches the second? In other words:
https://stackoverflow.com/users/12283851/user12283851 <== Valid
https://stackoverflow.com/users/11111111/user12283851 <== Invalid
This is accomplished by using backreferences.
The backreference \1 (backslash one) references the first capturing group. \1 matches the exact same text that was matched by the first capturing group
In your example the following regex would work:
https?://stackoverflow\.com/users/(\d{1,9})/user\1
See this demo

Regular expressions, group without getting to matches [duplicate]

This question already has answers here:
What is a non-capturing group in regular expressions?
(18 answers)
Closed 8 years ago.
regexp like this:
/<span[^>]*class=\"link[^>]*params=\"(\d+),(\d+),[^>]*>[^<]*from.*?(\d{1,2})(.*?)(\d{4}).*?(year|Year)[^<]*<\/span>/
string like that:
<p id="p_195" class="s_16" style="text-indent:6pt;"><span class="link s_8" params="65537,21403229,0,195,0,0" onmouseover="this.style.textDecoration='underline';" onmouseout="this.style.textDecoration='none';" onclick="return onClickLink(event, this);">Sometext from 28 september 2013& nbsp;year</span></p>
The trouble is that september with . There can be a space or . I change regexp to: bla-bla-blah... from.*?(\d{1,2})**(& nbsp;|\s)**(.*?)(\d{4}).*?(year|Year) ...bla-bla-blah
(& nbsp; without a space)
So, in matches I've got a ( |\s). But I do not need em there! How to group ( |\s) without getting em to matches?
You want a non-capturing group, try this:
?(\d{1,2})(?:& nbsp;|\s)(.?)(\d{4}).*?(year|Year)
See Kobi's comment to the OP for details. What is a non-capturing group? What does a question mark followed by a colon (?:) mean?
Be careful with non-capturing groups. They are not supported in all regex flavours and can mess up your post-processing code if you you rely on the group backreference indexes and suddenly decide to change a group to be non-capturing. My advice is to always used named groups in .Net.