RegEx for matching everything with specific words [duplicate] - regex

This question already has answers here:
Regex match entire words only
(7 answers)
Closed 3 years ago.
I would like to conduct regex substitution. Here is the pattern I am using:
.*?fee.*?$|.*?charge.*?$
The matches the desired lines
"fees credit card"
"charges for interest"
However, it is also matching on coffee and feeder (I want to be specific that it does not match "coffee" or "feed" lines, how can I specifically prevent these matches but still handle cases like fee, fees)
"coffee shop"
feeder cattle

You could use an alternation with 2 word boundaries \b to prevent the words being part of a larger word.
For you example data, if you want to match the single or single or plural version you can make the s at the end optional by using a question mark.
^.*\b(?:fees?|charges?)\b.*$
^ Start of the string
.*\b Match any char except a newline followed by a word boundary
(?:fees?|charges?) Match any of the listed followed by an optional s
\b.* Word boundary, match any char except a newline 0+ times
$ Assert end of the string
Regex demo

If you are just trying to match those two lines, you can simply use an expression similar to this:
^(fees|charges).+$
If you wish to match certain words, you might add boundaries to group one similar to this expression:
^\b(fees|fee|charge|charges)\b(.+)$
If your pattern might be in the middle of string inputs, you can add another group in the left, similar to this expression:
(?:.+|)\b(fees|fee|charge|charges)\b(?:.+|)$
This graph shows how an expression like that would work:
Regular expression design can be achieved much easier, if/when there is real data.

Related

Find DATE match starting from end of string [duplicate]

This question already has answers here:
Regex Last occurrence?
(7 answers)
Closed 3 years ago.
I have the following RegEx syntax that will match the first date found.
([0-9]+)/([0-9]+)/([0-9]+)
However, I would like to start from the end of the content and search backwards. In other words, in the below example, my syntax will always match the first date, but I want it to match the last instead.
Some Text here
01/02/15
Some additional
text here.
10/04/14
Ending text
here
I believe this is possible by using a negative lookahead, but all my attempts failed at this because I don't understand RegEx enough. Help would be appreciated.
Note: my application uses RegEx PCRP.
You could make the dot match a newline using for example an inline modifier (?s) and match until the end of the string.
Then make use of backtracking until the last occurrence of the date like pattern and precede the first digit with a word boundary.
Use \K to forget what was matched and match the date like pattern.
^(?s).*\b\K[0-9]+/[0-9]+/[0-9]+
Regex demo
Note that the pattern is a very broad match and does not validate a date itself.

Notepad++: How to remove all string except containing period [duplicate]

This question already has answers here:
How to match only strings that do not contain a dot (using regular expressions)
(3 answers)
Closed 3 years ago.
I have numerous SELECT statements conjoined by UNION keyword in a single file. What I want to do is to extract all the db.table strings only? How can I delete all words not containing period (.) using regex in notepad++ editor? Database and table are the only ones with a period.
It's okay with me even if new lines are not removed. Though, as a learning bonus for everyone seeing this post, you can also show the regex that trims the new lines, that will show this output:
db.table1
db.table2
...
db.tablen
You may try the following find and replace, in regex mode:
Find: (?<=^|\s)[^.]+(?=$|\s)
Replace: <empty string>
Demo
Note that my replacement only removes the undesired terms in the query; it does not make an effort to remove stray or leftover whitespace. To do that, you can easily do a quick second replacement to remove whitespace you don't want.
Edit:
It appears that Notepad++ doesn't like the variable width lookbehinds I used in the pattern. Here is a refactored, and more verbose version, which uses strictly fixed width lookbehinds:
(^[^.]+$)|(^[^.]+(?=\s))|((?<=\s)[^.]+$)|((?<=\s)[^.]+(?=\s))
Demo
The logic in both of the above patterns is to match a word consisting entirely of non dot characters, which are surrounded on either side by one or more of the following:
start of the string (^)
end of the string ($)
any type of whitespace (\s)
My guess is that maybe this expression:
([\s\S]*?)(\S*(\.)\S*)
being replaced with $2\n or:
(\S*(\.)\S*)|(.+?)
with $1 might work.
Demo 1
Demo 2

Does not match when the string does not have a dot but it will match multiple dots [duplicate]

This question already has answers here:
Regex to allow alphanumeric and dot
(3 answers)
Closed 4 years ago.
I am trying to match the string when there's 0 or multiple dots. The regex that I can only match multiple dots but not 0 dot.
(\w*)((\w*\.)+\w*)
These are the test string I am using
dial.check.Catch.Url
dial.check.Catch.Url.Dial.check.Catch.Url
32443.324342.23423424.23.423.423.42.34.234.32.4..2..2.342.4
234dfasfd2aa4234234.234aa341.4.123daaadf.df.af....
12fd.dafd
.
abc
The Regex will match these
dial.check.Catch.Url
dial.check.Catch.Url.Dial.check.Catch.Url
32443.324342.23423424.23.423.423.42.34.234.32.4..2..2.342.4
234dfasfd2aa4234234.234aa341.4.123daaadf.df.af....
12fd.dafd
.
But not this one:
abc
https://regexr.com/?38ed7
If you really must use a regex, here is one (but it is inefficient):
/^(?![^.]*\.[^.]*$).*$/
It says:
Match a string so that the beginning of the string is not followed by a whole string with a single dot.
It does some backtracking when parsing the negative lookahead.
As mentioned in the comments to the question, I do think, unless you must have a regex, that a simple function might be better. But if you like the conciseness of a regex and performance is not a huge concern, you can go with the one I gave above. Regexes with "nots" in them are generally a tad messy, but once you understand lookarounds they do become doable. Cheers.
/\..*\.|^[^.]*$/
Or, in plain English:
Match EITHER a dot, then any number of characters, then another dot; OR the beginning of the string, then any number of non-dots, then the end of the string.

Regex for string containing one string, but not another [duplicate]

This question already has answers here:
Regular expression for a string containing one word but not another
(5 answers)
Closed 3 years ago.
Have regex in our project that matches any url that contains the string
"/pdf/":
(.+)/pdf/.+
Need to modify it so that it won't match urls that also contain "help"
Example:
Shouldn't match: "/dealer/help/us/en/pdf/simple.pdf"
Should match: "/dealer/us/en/pdf/simple.pdf"
If lookarounds are supported, this is very easy to achieve:
(?=.*/pdf/)(?!.*help)(.+)
See a demo on regex101.com.
(?:^|\s)((?:[^h ]|h(?!elp))+\/pdf\/\S*)(?:$|\s)
First thing is match either a space or the start of a line
(?:^|\s)
Then we match anything that is not a or h OR any h that does not have elp behind it, one or more times +, until we find a /pdf/, then match non-space characters \S any number of times *.
((?:[^h ]|h(?!elp))+\/pdf\/\S*)
If we want to detect help after the /pdf/, we can duplicate matching from the start.
((?:[^h ]|h(?!elp))+\/pdf\/(?:[^h ]|h(?!elp))+)
Finally, we match a or end line/string ($)
(?:$|\s)
The full match will include leading/trailing spaces, and should be stripped. If you use capture group 1, you don't need to strip the ends.
Example on regex101

How to match a line not containing a word [duplicate]

This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 6 years ago.
I was wondering how to match a line not containing a specific word using Python-style Regex (Just use Regex, not involve Python functions)?
Example:
PART ONE OVERVIEW 1
Chapter 1 Introduction 3
I want to match lines that do not contain the word "PART"?
This should work:
/^((?!PART).)*$/
Edit (by request): How this works
The (?!...) syntax is a negative lookahead, which I've always found tough to explain. Basically, it means "whatever follows this point must not match the regular expression /PART/." The site I've linked explains this far better than I can, but I'll try to break this down:
^ #Start matching from the beginning of the string.
(?!PART) #This position must not be followed by the string "PART".
. #Matches any character except line breaks (it will include those in single-line mode).
$ #Match all the way until the end of the string.
The ((?!xxx).)* idiom is probably hardest to understand. As we saw, (?!PART) looks at the string ahead and says that whatever comes next can't match the subpattern /PART/. So what we're doing with ((?!xxx).)* is going through the string letter by letter and applying the rule to all of them. Each character can be anything, but if you take that character and the next few characters after it, you'd better not get the word PART.
The ^ and $ anchors are there to demand that the rule be applied to the entire string, from beginning to end. Without those anchors, any piece of the string that didn't begin with PART would be a match. Even PART itself would have matches in it, because (for example) the letter A isn't followed by the exact string PART.
Since we do have ^ and $, if PART were anywhere in the string, one of the characters would match (?=PART). and the overall match would fail. Hope that's clear enough to be helpful.