Regex for string containing one string, but not another [duplicate] - regex

This question already has answers here:
Regular expression for a string containing one word but not another
(5 answers)
Closed 3 years ago.
Have regex in our project that matches any url that contains the string
"/pdf/":
(.+)/pdf/.+
Need to modify it so that it won't match urls that also contain "help"
Example:
Shouldn't match: "/dealer/help/us/en/pdf/simple.pdf"
Should match: "/dealer/us/en/pdf/simple.pdf"

If lookarounds are supported, this is very easy to achieve:
(?=.*/pdf/)(?!.*help)(.+)
See a demo on regex101.com.

(?:^|\s)((?:[^h ]|h(?!elp))+\/pdf\/\S*)(?:$|\s)
First thing is match either a space or the start of a line
(?:^|\s)
Then we match anything that is not a or h OR any h that does not have elp behind it, one or more times +, until we find a /pdf/, then match non-space characters \S any number of times *.
((?:[^h ]|h(?!elp))+\/pdf\/\S*)
If we want to detect help after the /pdf/, we can duplicate matching from the start.
((?:[^h ]|h(?!elp))+\/pdf\/(?:[^h ]|h(?!elp))+)
Finally, we match a or end line/string ($)
(?:$|\s)
The full match will include leading/trailing spaces, and should be stripped. If you use capture group 1, you don't need to strip the ends.
Example on regex101

Related

Regex if character matches then, else [duplicate]

This question already has answers here:
In regex, match either the end of the string or a specific character
(2 answers)
Closed 7 months ago.
I have two regular expressions that work fine to extract text between characters:
(?<=\$)(.*)(?=\*)
(?<=\$)(.*)(?=)
For my example text $66* the first expression extracts 66. When the asterisk is not present in the text (i.e. $66), the second expression extracts 66.
How can I combine the two to use the first one if an asterisk is present and the second one if no asterisk is present?
I tried with what I thought would be an if|then|else like below but am doing something wrong: (?(?=\*)(?<=\$)(.*)(?=\*)|(?<=\$)(.*)(?=))
You can use a negated character set to exclude asterisks in your match instead:
(?<=\$)[^*]+
Demo: https://regex101.com/r/vuGBiJ/2
As you are already using a capture group, you could also match the $ and capture 1+ characters except the asterix.
\$([^*]+)
Regex demo

Find DATE match starting from end of string [duplicate]

This question already has answers here:
Regex Last occurrence?
(7 answers)
Closed 3 years ago.
I have the following RegEx syntax that will match the first date found.
([0-9]+)/([0-9]+)/([0-9]+)
However, I would like to start from the end of the content and search backwards. In other words, in the below example, my syntax will always match the first date, but I want it to match the last instead.
Some Text here
01/02/15
Some additional
text here.
10/04/14
Ending text
here
I believe this is possible by using a negative lookahead, but all my attempts failed at this because I don't understand RegEx enough. Help would be appreciated.
Note: my application uses RegEx PCRP.
You could make the dot match a newline using for example an inline modifier (?s) and match until the end of the string.
Then make use of backtracking until the last occurrence of the date like pattern and precede the first digit with a word boundary.
Use \K to forget what was matched and match the date like pattern.
^(?s).*\b\K[0-9]+/[0-9]+/[0-9]+
Regex demo
Note that the pattern is a very broad match and does not validate a date itself.

RegEx for matching everything with specific words [duplicate]

This question already has answers here:
Regex match entire words only
(7 answers)
Closed 3 years ago.
I would like to conduct regex substitution. Here is the pattern I am using:
.*?fee.*?$|.*?charge.*?$
The matches the desired lines
"fees credit card"
"charges for interest"
However, it is also matching on coffee and feeder (I want to be specific that it does not match "coffee" or "feed" lines, how can I specifically prevent these matches but still handle cases like fee, fees)
"coffee shop"
feeder cattle
You could use an alternation with 2 word boundaries \b to prevent the words being part of a larger word.
For you example data, if you want to match the single or single or plural version you can make the s at the end optional by using a question mark.
^.*\b(?:fees?|charges?)\b.*$
^ Start of the string
.*\b Match any char except a newline followed by a word boundary
(?:fees?|charges?) Match any of the listed followed by an optional s
\b.* Word boundary, match any char except a newline 0+ times
$ Assert end of the string
Regex demo
If you are just trying to match those two lines, you can simply use an expression similar to this:
^(fees|charges).+$
If you wish to match certain words, you might add boundaries to group one similar to this expression:
^\b(fees|fee|charge|charges)\b(.+)$
If your pattern might be in the middle of string inputs, you can add another group in the left, similar to this expression:
(?:.+|)\b(fees|fee|charge|charges)\b(?:.+|)$
This graph shows how an expression like that would work:
Regular expression design can be achieved much easier, if/when there is real data.

RegExp match lines NOT starting with at-symbol [duplicate]

This question already has an answer here:
Regular expression for a string that does not start with a sequence
(1 answer)
Closed 7 years ago.
If I write this regexp (?<=^[\t ]*#).+ I can match only the lines starting with optional spaces (but not newlines) and at-symbol, without matching the at-symbol.
Example:
#test Matches "test", but not the " #".
I'm trying to match lines that it first not space character is not the at-symbol. For that purpose I negate the look-behind, resulting in this: (?<!^[\t ]*#).+.
But it matches lines even if their first non-space character is the at-symbol.
I've tried regexps like these:
^[\t ]*[^#].*,
(?<=^[\t ]*[^#]).+,
(?<=^[\t ]*)(.(?!#)).*.
All of then matches lines even their first non-space character is the at-symbol.
How can I do to match lines not starting with optional spaces (not newlines) and the at-symbol?
matches
matches
m#tches
m#tches
#Doesn't match
#Doesn't match
Thanks!
Your pattern was good except two things:
you need a lookahead (followed with), not a lookbehind
you need to anchor your pattern at the start of the line
So, if you read the text line by line:
^(?![ \t]*#).+
If you read the whole text you need to use the multiline modifier to make ^ to match the start of the line (and not the start of the string by default):
(?m)^(?![ \t]*#).+
(or an other way to switch on this m modifier)

regex that matches everything except a constant [duplicate]

This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 8 years ago.
I need a regexp that will match everything except a single constant (case ignored)
Example for constant ALL, should match words like: dog, MOUSE, mall, alligator. But it shouldn't match: all, ALL, alL.
(?si)^(?!all$).*
will match any string except all (case-insensitively).
(?i) makes the regex case-insensitive, (?s) allows the dot to match any character, including newlines. If you don't expect newlines in your input, you can remove the s.
See it live on regex101.com.