Regexp to match multi-line string [duplicate] - regex

This question already has answers here:
What is a non-capturing group in regular expressions?
(18 answers)
Closed 2 years ago.
I have this regexp:
^(?<FOOTER_TYPE>[ a-zA-Z0-9-]+)?(?<SEPARATOR>:)?(?<FOOTER>(?<=:)(.|[\r\n](?![\r\n]))*)?
Which I'm using to match text like:
BREAKING CHANGE: test
my multiline
string.
This is not matched
You can see the result here https://regex101.com/r/gGroPK/1
However, why is there the last Group 4 ?

You will need to make last group non-capturing:
^(?<FOOTER_TYPE>[ a-zA-Z0-9-]+)?(?<SEPARATOR>:)?(?<FOOTER>(?<=:)(?:.|[\r\n](?![\r\n]))*)?
Make note of:
(?:.|[\r\n](?![\r\n]))*)?
(?: at the start makes this optional group non-capturing.
Updated Demo

it is group 4 because the fourth parentheses you defined is:
(.|[\r\n](?![\r\n]))*)
it translate to
"either dot, or the following regex"
and in the example you have, it ends on a dot.
string.
so as regex is usually greedy, it captures dot as the forth group

Related

Regex if character matches then, else [duplicate]

This question already has answers here:
In regex, match either the end of the string or a specific character
(2 answers)
Closed 7 months ago.
I have two regular expressions that work fine to extract text between characters:
(?<=\$)(.*)(?=\*)
(?<=\$)(.*)(?=)
For my example text $66* the first expression extracts 66. When the asterisk is not present in the text (i.e. $66), the second expression extracts 66.
How can I combine the two to use the first one if an asterisk is present and the second one if no asterisk is present?
I tried with what I thought would be an if|then|else like below but am doing something wrong: (?(?=\*)(?<=\$)(.*)(?=\*)|(?<=\$)(.*)(?=))
You can use a negated character set to exclude asterisks in your match instead:
(?<=\$)[^*]+
Demo: https://regex101.com/r/vuGBiJ/2
As you are already using a capture group, you could also match the $ and capture 1+ characters except the asterix.
\$([^*]+)
Regex demo

Repeating pattern for a regex -- validate the same [duplicate]

This question already has answers here:
Have trouble understanding capturing groups and back references
(2 answers)
Closed 3 years ago.
The url of my username is:
https://stackoverflow.com/users/12283851/user12283851
For this username it looks like the regular expression might be close to:
r'https?://stackoverflow.com/users/\d{1,9}/user\d{1,9}'
Is there a way in the regex to make sure that the first ID matches the second? In other words:
https://stackoverflow.com/users/12283851/user12283851 <== Valid
https://stackoverflow.com/users/11111111/user12283851 <== Invalid
This is accomplished by using backreferences.
The backreference \1 (backslash one) references the first capturing group. \1 matches the exact same text that was matched by the first capturing group
In your example the following regex would work:
https?://stackoverflow\.com/users/(\d{1,9})/user\1
See this demo

Regex substitution: find double quotes not following by specific character [duplicate]

This question already has an answer here:
Regex Match a character which is not followed by another specific character
(1 answer)
Closed 4 years ago.
I have the following situation:
3" a
3":a
3",a
3"a
3"2
3"A
I need to find a replace a double quote with space every time the double quote is not following by : or ,.
So, for my case the expected results will be:
3 a
3":a
3",a
3 a
3 2
3 A
Any idea how write this logic using regex?
Regards,
You can use a negative lookahead A(?!B) for that. It matches an expression A that is not followed by expression B.
The replacement of the matches with spaces will depend on the used language.
"(?![:,])
Applied to your examples: https://regex101.com/r/UiPlaC/2
If you want to handle the case 3" a without having multiple spaces, just include one (or even more?) optional spaces in the match.
"(?![:,])\ ?
See here for more information:
Regex lookahead, lookbehind and atomic groups
https://www.regular-expressions.info/lookaround.html

Regex for string containing one string, but not another [duplicate]

This question already has answers here:
Regular expression for a string containing one word but not another
(5 answers)
Closed 3 years ago.
Have regex in our project that matches any url that contains the string
"/pdf/":
(.+)/pdf/.+
Need to modify it so that it won't match urls that also contain "help"
Example:
Shouldn't match: "/dealer/help/us/en/pdf/simple.pdf"
Should match: "/dealer/us/en/pdf/simple.pdf"
If lookarounds are supported, this is very easy to achieve:
(?=.*/pdf/)(?!.*help)(.+)
See a demo on regex101.com.
(?:^|\s)((?:[^h ]|h(?!elp))+\/pdf\/\S*)(?:$|\s)
First thing is match either a space or the start of a line
(?:^|\s)
Then we match anything that is not a or h OR any h that does not have elp behind it, one or more times +, until we find a /pdf/, then match non-space characters \S any number of times *.
((?:[^h ]|h(?!elp))+\/pdf\/\S*)
If we want to detect help after the /pdf/, we can duplicate matching from the start.
((?:[^h ]|h(?!elp))+\/pdf\/(?:[^h ]|h(?!elp))+)
Finally, we match a or end line/string ($)
(?:$|\s)
The full match will include leading/trailing spaces, and should be stripped. If you use capture group 1, you don't need to strip the ends.
Example on regex101

regex for matching and excluding the rest [duplicate]

This question already has answers here:
Regex match entire words only
(7 answers)
Closed 6 years ago.
i want to match a very simple number-bar-number pattern : 1/2
My regex is: ([0-9]{1}\/[0-9]{1})
The problem is that I match things I want to exclude. I need exact matching excluding the rest.
My regex return as valid patterns as :
1/12344
2/23ABC
2/233423/2425
[update]
tested with some txt files using GREP, still having issues. By instance:
2/3/16 (it's a date and it matches the pattern, so grep returned the entire line)
I'm not very versed on regex so any help would be very much appreciated
Regards
Try this
(?:^|\s)(\d+\/\d+)(?=\s|$)
Regex demo
Explanation:
(?: … ): Non-capturing group sample
^: Start of string or start of line depending on multiline mode sample
|: Alternation / OR operand sample
\: Escapes a special character sample
( … ): Capturing group sample
+: One or more sample
(?=…): Positive lookahead sample
$: End of string or end of line depending on multiline mode sample