Notepad++: How to remove all string except containing period [duplicate] - regex

This question already has answers here:
How to match only strings that do not contain a dot (using regular expressions)
(3 answers)
Closed 3 years ago.
I have numerous SELECT statements conjoined by UNION keyword in a single file. What I want to do is to extract all the db.table strings only? How can I delete all words not containing period (.) using regex in notepad++ editor? Database and table are the only ones with a period.
It's okay with me even if new lines are not removed. Though, as a learning bonus for everyone seeing this post, you can also show the regex that trims the new lines, that will show this output:
db.table1
db.table2
...
db.tablen

You may try the following find and replace, in regex mode:
Find: (?<=^|\s)[^.]+(?=$|\s)
Replace: <empty string>
Demo
Note that my replacement only removes the undesired terms in the query; it does not make an effort to remove stray or leftover whitespace. To do that, you can easily do a quick second replacement to remove whitespace you don't want.
Edit:
It appears that Notepad++ doesn't like the variable width lookbehinds I used in the pattern. Here is a refactored, and more verbose version, which uses strictly fixed width lookbehinds:
(^[^.]+$)|(^[^.]+(?=\s))|((?<=\s)[^.]+$)|((?<=\s)[^.]+(?=\s))
Demo
The logic in both of the above patterns is to match a word consisting entirely of non dot characters, which are surrounded on either side by one or more of the following:
start of the string (^)
end of the string ($)
any type of whitespace (\s)

My guess is that maybe this expression:
([\s\S]*?)(\S*(\.)\S*)
being replaced with $2\n or:
(\S*(\.)\S*)|(.+?)
with $1 might work.
Demo 1
Demo 2

Related

Find DATE match starting from end of string [duplicate]

This question already has answers here:
Regex Last occurrence?
(7 answers)
Closed 3 years ago.
I have the following RegEx syntax that will match the first date found.
([0-9]+)/([0-9]+)/([0-9]+)
However, I would like to start from the end of the content and search backwards. In other words, in the below example, my syntax will always match the first date, but I want it to match the last instead.
Some Text here
01/02/15
Some additional
text here.
10/04/14
Ending text
here
I believe this is possible by using a negative lookahead, but all my attempts failed at this because I don't understand RegEx enough. Help would be appreciated.
Note: my application uses RegEx PCRP.
You could make the dot match a newline using for example an inline modifier (?s) and match until the end of the string.
Then make use of backtracking until the last occurrence of the date like pattern and precede the first digit with a word boundary.
Use \K to forget what was matched and match the date like pattern.
^(?s).*\b\K[0-9]+/[0-9]+/[0-9]+
Regex demo
Note that the pattern is a very broad match and does not validate a date itself.

RegEx for adding a zero between a dash and number [duplicate]

This question already has answers here:
Replacing digits immediately after a saved pattern
(2 answers)
Closed 3 years ago.
I want to find a way to add a leading zero "0" in front of numbers but BBEdit thinks it's substitute #10 Example:
Original string: Video 2-1: Title Goes Here
Desired result: Video 2-01: Title Goes Here
My find regex is: (-)(\d:)
My replace regex is: \10\2. The first substitute is NOT 10. I simply intend to replace first postion, then add a "0", then replace second position.
Kindly tell me how to tell BBEdit that I want to add a zero and that I don't mean 10th position.
If you simply need a number preceded by a dash, then I recommend using the regex lookbehind for this one.
Try this out:
(?<=-)(\d+:)
As seen here: regex101.com
It tells the regex that the match should be preceded by a dash -, and the - itself won't be matched!
You really don't need to capture hyphen in group1 (as it is a fixed string so no benefit capturing in group1 and replacing with \1) for replacement, instead just capture hyphen with digit using -(\d+:) and while replacing just use -0\1
Regex Demo
Also, there are other better ways to make the replacement where you don't need to deal with back references at all.
Another alternate solution is to use this look around based regex,
(?<=-)(?=\d+:)
and replace it with just 0 which will just insert a zero before the digit.
Regex Demo with lookaround
Another alternate solution when lookbehind is not supported (like in Javascript prior to EcmaScript2018), you can use a positive look ahead based solution. Basically match a hyphen - which is followed by digits and colon using this regex,
-(?=\d+:)
and replace it with -0
Regex Demo with only positive look ahead
Try \1\x30\2 as the replacement. \x30 is the hex escape for the 0 character, so the replacement is \1, then 0, then \2, and cannot be interpreted as \10 then 2. I don't know if BBEdit supports hex escapes in the replacement string though.
This expression might help you to do so, if Video 2- is a fixed input:
(Video 2-)(.+)
If you have other instances, you can add left boundary to this expression, maybe something similar to this:
([A-Za-z]+\s[0-9]+-)(.+)
Then, you can simply replace it with a leading zero after capturing group $1:
Graph
This graph shows how the expression would work:
If you wish, you can add additional boundaries to the expression.
Replacement
For replacing, you can simply use \U0030 or \x30 instead of zero, whichever your program might support, in between $1 and $2.

RegEx for matching everything between two special characters [duplicate]

This question already has answers here:
RegEx to select everything between two characters?
(4 answers)
Closed 3 years ago.
I want to find all characters between 2 special characters. I can't find the solution though because there are new lines that are not included. It's prolly easy, but I can't seem to find the right regex for it.
How do I solve this problem?
The source data is structured like this:
\#(.*)\;
doesn't include new lines and
(?!\#)([\S\s])(?!=\;)
doesn't work also.
It selects everything, but doesn't do the group trick...
Source looks like this:
#first line of text;
#second line of text;
#third line could easy
be on a new line;
#forth etc;
#this could (#hi,#hi,#hi) also
happen though:));
#so.... any idea;
any new line starts with # and every line ends with ;
I see two problems in your regex,
You are missing quantifier in your [\S\s] due to which it will only match one character.
Second you need a non-greedy regex so it doesn't match all the lines.
Also, where you wrote this (?!#) I guess you meant to write any one character among them, for which you should place it in a character set like this [?!#]
You need this regex, where you can capture your text from group1
#([\w\W]*?);
Regex Demo
And like you attempted, if you want your full match to only select the intended text, you can use lookaround.
Regex Demo with lookarounds so your full match is intended text only
Also, writing [^;]* (which also matches newlines) is way faster than .*? hence you should preferably use this regex,
(?<=[?!#])[^;]*(?=;)
Regex Demo with best performance
You just need to modify your first regex a little bit so that it looks like this:
#([\s\S]*?);
. will only match non new line characters. So I replaced it with [\s\S] - the set of whitespaces union the set of non-whitespaces - the set of all characters. If your regex engine has the "single line" option, you can turn that on, and . will match new lines as well.
I also made * lazy. Otherwise it will just be one whole match that matches all the way to the last ;. For more info, see this question.
You don't need to escape the ;.
You have to use either a single line flag /s or add whitespace characters \s as second alternative to all characters .. Also, your * quantifier must be lazy/non-greedy, so the whole regex stops at first ; it founds.
#((?:.|\s)*?); or #(.*?);/s

RegEx for matching everything with specific words [duplicate]

This question already has answers here:
Regex match entire words only
(7 answers)
Closed 3 years ago.
I would like to conduct regex substitution. Here is the pattern I am using:
.*?fee.*?$|.*?charge.*?$
The matches the desired lines
"fees credit card"
"charges for interest"
However, it is also matching on coffee and feeder (I want to be specific that it does not match "coffee" or "feed" lines, how can I specifically prevent these matches but still handle cases like fee, fees)
"coffee shop"
feeder cattle
You could use an alternation with 2 word boundaries \b to prevent the words being part of a larger word.
For you example data, if you want to match the single or single or plural version you can make the s at the end optional by using a question mark.
^.*\b(?:fees?|charges?)\b.*$
^ Start of the string
.*\b Match any char except a newline followed by a word boundary
(?:fees?|charges?) Match any of the listed followed by an optional s
\b.* Word boundary, match any char except a newline 0+ times
$ Assert end of the string
Regex demo
If you are just trying to match those two lines, you can simply use an expression similar to this:
^(fees|charges).+$
If you wish to match certain words, you might add boundaries to group one similar to this expression:
^\b(fees|fee|charge|charges)\b(.+)$
If your pattern might be in the middle of string inputs, you can add another group in the left, similar to this expression:
(?:.+|)\b(fees|fee|charge|charges)\b(?:.+|)$
This graph shows how an expression like that would work:
Regular expression design can be achieved much easier, if/when there is real data.

Regex for string containing one string, but not another [duplicate]

This question already has answers here:
Regular expression for a string containing one word but not another
(5 answers)
Closed 3 years ago.
Have regex in our project that matches any url that contains the string
"/pdf/":
(.+)/pdf/.+
Need to modify it so that it won't match urls that also contain "help"
Example:
Shouldn't match: "/dealer/help/us/en/pdf/simple.pdf"
Should match: "/dealer/us/en/pdf/simple.pdf"
If lookarounds are supported, this is very easy to achieve:
(?=.*/pdf/)(?!.*help)(.+)
See a demo on regex101.com.
(?:^|\s)((?:[^h ]|h(?!elp))+\/pdf\/\S*)(?:$|\s)
First thing is match either a space or the start of a line
(?:^|\s)
Then we match anything that is not a or h OR any h that does not have elp behind it, one or more times +, until we find a /pdf/, then match non-space characters \S any number of times *.
((?:[^h ]|h(?!elp))+\/pdf\/\S*)
If we want to detect help after the /pdf/, we can duplicate matching from the start.
((?:[^h ]|h(?!elp))+\/pdf\/(?:[^h ]|h(?!elp))+)
Finally, we match a or end line/string ($)
(?:$|\s)
The full match will include leading/trailing spaces, and should be stripped. If you use capture group 1, you don't need to strip the ends.
Example on regex101