This question already has answers here:
Have trouble understanding capturing groups and back references
(2 answers)
Closed 3 years ago.
The url of my username is:
https://stackoverflow.com/users/12283851/user12283851
For this username it looks like the regular expression might be close to:
r'https?://stackoverflow.com/users/\d{1,9}/user\d{1,9}'
Is there a way in the regex to make sure that the first ID matches the second? In other words:
https://stackoverflow.com/users/12283851/user12283851 <== Valid
https://stackoverflow.com/users/11111111/user12283851 <== Invalid
This is accomplished by using backreferences.
The backreference \1 (backslash one) references the first capturing group. \1 matches the exact same text that was matched by the first capturing group
In your example the following regex would work:
https?://stackoverflow\.com/users/(\d{1,9})/user\1
See this demo
Related
This question already has answers here:
Can I use an OR in regex without capturing what's enclosed?
(4 answers)
Closed 8 months ago.
I'm trying to make a group at the end optional. But it includes one excess result.
What the right regex could look like? Would appreciate any help here.
Regex = <!([a-z]{0,25})\|([^\|>]*)\|([^\|<>]*)(\|([^\|<>]*))?>
Example = <!user|123|Kirill|{"color":"rgb(255, 184, 75)"}> published <!content|456|A cool content>
Gives the following 1st matched group, the highlighted is the excess unexpected result:
The pipe could be outside of the last capture group, then make that whole part optional using a non capture group.
Note that you don't have to escape the pipe in a character class.
<!([a-z]{0,25})\|([^|>]*)\|([^|<>]*)(?:\|([^|<>]*))?>
Regex demo
This question already has answers here:
Regex that matches anything except for all whitespace
(5 answers)
Closed 2 years ago.
I'm using a regular expression to match Facebook url.
((http|https)://)?(www[.])?facebook.com/.+
It accepts:
facebook.com/xxxxx
www.facebook.com/xxxxx
http://facebook.com/xxxxx
https://facebook.com/xxxxx
But it still accepts whitespaces after /:
facebook.com/(spaces there)
How can I prevent it?
You can shorten the pattern by making the s optional in https using a quesion mark, and use \S+ to match 1 or more non whitespace characters instead of .+ which can also match spaces.
(?:https?://)?(?:www\.)?facebook\.com/\S+
Regex demo
This question already has answers here:
What is a non-capturing group in regular expressions?
(18 answers)
Closed 2 years ago.
I have this regexp:
^(?<FOOTER_TYPE>[ a-zA-Z0-9-]+)?(?<SEPARATOR>:)?(?<FOOTER>(?<=:)(.|[\r\n](?![\r\n]))*)?
Which I'm using to match text like:
BREAKING CHANGE: test
my multiline
string.
This is not matched
You can see the result here https://regex101.com/r/gGroPK/1
However, why is there the last Group 4 ?
You will need to make last group non-capturing:
^(?<FOOTER_TYPE>[ a-zA-Z0-9-]+)?(?<SEPARATOR>:)?(?<FOOTER>(?<=:)(?:.|[\r\n](?![\r\n]))*)?
Make note of:
(?:.|[\r\n](?![\r\n]))*)?
(?: at the start makes this optional group non-capturing.
Updated Demo
it is group 4 because the fourth parentheses you defined is:
(.|[\r\n](?![\r\n]))*)
it translate to
"either dot, or the following regex"
and in the example you have, it ends on a dot.
string.
so as regex is usually greedy, it captures dot as the forth group
This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 5 years ago.
I am using below pattern in json schema to validate strings.
"pattern": "^(nfs://)(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?):([0-9]{4})"
But currently it is not validating "nfs://172.1.1:2049" as invalid string.
This doesn't immediately seem like an obvious problem, but the . character needs to be escaped because you're trying to literally match that character.
This regex, with escaped . and forward slashes works:
^(nfs:\/\/)(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?):([0-9]{4})
The problem was that since each capturing group that matches digits can match as few as one digit or as many as three, the regex engine looked at the first 1 (in 172), found that it was valid, then tried matching . (any character) and found the digit 7, which is not what you want.
In nfs://172.1.1:2049, the second capturing group in your regex matched the first 1 in the IP address, the . matched the 7, the third capturing group matched the 2.. and so on.
Try it here: https://regex101.com/r/TNXDiQ/1
This question already has answers here:
What is a non-capturing group in regular expressions?
(18 answers)
Closed 8 years ago.
regexp like this:
/<span[^>]*class=\"link[^>]*params=\"(\d+),(\d+),[^>]*>[^<]*from.*?(\d{1,2})(.*?)(\d{4}).*?(year|Year)[^<]*<\/span>/
string like that:
<p id="p_195" class="s_16" style="text-indent:6pt;"><span class="link s_8" params="65537,21403229,0,195,0,0" onmouseover="this.style.textDecoration='underline';" onmouseout="this.style.textDecoration='none';" onclick="return onClickLink(event, this);">Sometext from 28 september 2013& nbsp;year</span></p>
The trouble is that september with . There can be a space or . I change regexp to: bla-bla-blah... from.*?(\d{1,2})**(& nbsp;|\s)**(.*?)(\d{4}).*?(year|Year) ...bla-bla-blah
(& nbsp; without a space)
So, in matches I've got a ( |\s). But I do not need em there! How to group ( |\s) without getting em to matches?
You want a non-capturing group, try this:
?(\d{1,2})(?:& nbsp;|\s)(.?)(\d{4}).*?(year|Year)
See Kobi's comment to the OP for details. What is a non-capturing group? What does a question mark followed by a colon (?:) mean?
Be careful with non-capturing groups. They are not supported in all regex flavours and can mess up your post-processing code if you you rely on the group backreference indexes and suddenly decide to change a group to be non-capturing. My advice is to always used named groups in .Net.