Multiple search & replace - Notepad ++ (regex) - regex

I have a list of words, for example:
Good -> Bad
Sky -> Blue
Gray -> Black
etc...
What is the best why to do find&replace in notepad++?
I tried this:
FIND: (Good)|(Sky)|(Gray)
Replace: (?1Bad)(?2Blue)(?3Black)
but it doesn't work :(
any idea? or suggestions ?

There is however a workaround if you add this newline at the end of your text (it must be the last line, so don't press enter at the end):
#Good:Bad#Sky:Blue#Gray:Black#
and if you use this pattern:
(Good|Blue|Black)(?=(?:.*\R)++#(?>[^#]+#)*?\1:([^#]+))|\R.++(?!\R)
with this replacement:
$2
pattern details:
(Good|Blue|Black) # this part capture the word in group 1
(?= # then we reach the last line in a lookakead
(?:.*\R)++ # match all the lines until the last line
#(?>[^#]+#)*? # advance until the good value is found
\1 # the good value (backreference to the capture group 1)
: ([^#]+) # capture the replacement in group 2
) # close the lookbehind
| # OR
\R.++(?!\R) # match the last line (to remove it)
Note: to make the pattern more efficient, you can put it in a non capturing group and add a lookahead at the begining with all the first possible characters to quickly discard useless positions in the string:
(?=[GB\r\n])(?:\b(Good|Blue|Black)\b(?=(?:.*\R)++#(?>[^#]+#)*?\1:([^#]+))|\R.++(?!\R))

Related

Regex to disregard partial matches across lines / matching too much

I have three lines of tab-separated values:
SELL 2022-06-28 12:42:27 39.42 0.29 11.43180000 0.00003582
BUY 2022-06-28 12:27:22 39.30 0.10 3.93000000 0.00001233
_____2022-06-28 12:27:22 39.30 0.19 7.46700000 0.00002342
The first two have 'SELL' or 'BUY' as first value but the third one has not, hence a Tab mark where I wrote ______:
I would like to capture the following using Regex:
My expression ^(BUY|SELL).+?\r\n\t does not work as it gets me this:
I do know why outputs this - adding an lazy-maker '?' obviously won't help. I don't get lookarounds to work either, if they are the right means at all. I need something like 'Match \r\n\t only or \r\n(?:^\t) at the end of each line'.
The final goal is to make the three lines look at this at the end, so I will need to replace the match with capturing groups:
Can anyone point me to the right direction?
Ctrl+H
Find what: ^(BUY|SELL).+\R\K\t
Replace with: $1\t
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
(BUY|SELL) # group 1, BUY or SELL
.+ # 1 or more any character but newline
\R # any kind of linebreak
\K # forget all we have seen until this position
\t # a tabulation
Replacement:
$1 # content of group 1
\t # a tabulation
Screenshot (before):
Screenshot (after):
You can use the following regex ((BUY|SELL)[^\n]+\n)\s+ and replace with \1\2.
Regex Match Explanation:
((BUY|SELL)[^\n]+\n): Group 1
(BUY|SELL): Group 2
BUY: sequence of characters "BUY" followed by a space
|: or
SELL: sequence of characters "SELL" followed by a space
[^\n]+: any character other than newline
\n: newline character
\s+: any space characters
Regex Replace Explanation:
\1: Reference to Group 1
\2: Reference to Group 2
Check the demo here. Tested on Notepad++ in a private environment too.
Note: Make sure to check the "Regular expression" checkbox.
Regex

Notepad++ and regex - how to title case string between two particular strings?

I have hundreds of bib references in a file, and they have the following syntax:
#article{tabata1999precise,
title={Precise synthesis of monosubstituted polyacetylenes using Rh complex catalysts.
Control of solid structure and $\pi$-conjugation length},
author={Tabata, Masayoshi and Sone, Takeyuchi and Sadahiro, Yoshikazu},
journal={Macromolecular chemistry and physics},
volume={200},
number={2},
pages={265--282},
year={1999},
publisher={Wiley Online Library}
}
I would like to title case (aka Proper Case) the journal name in Notepad++ using regular expression. For example, from Macromolecular chemistry and physics to Macromolecular Chemistry and Physics.
I am able to find all instances using:
(?<=journal\=\{).*?(?=\})
but I am unable to change the case via Edit > Convert Case to. Apparently it doesn't work on find all and I have to go one by one.
Next, I tried recording and running a macro but Notepad++ just hangs indefinitely when I try to run it (option to run until the end of the file).
So my question is: does anyone know the replace regex syntax I could use to change the case? Ideally, I would also like to use "|" exclusions for particular words such as " of ", " an ", " the ", etc. I tried to play with some of the examples provided here, but I was not able to integrate it into my look-aheads.
Thank you in advance, I'd appreciate any help.
This works for any number of words:
Ctrl+H
Find what: (?:journal={|\G)\K(?:(\w{4,})|(\w+))(\h*)
Replace with: \u$1\E$2$3
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
(?: # non capture group
journal={ # literally
| # OR
\G # restart from last match position
) # end group
\K # forget all we have seen until this position
(?: # non capture group
(\w{4,}) # group 1, a word with 4 or more characters
| # OR
(\w+) # group 2, a word of any length
) # end group
(\h*) # group 3, 0 or more horizontal spaces
Replacement:
\u # uppercased the first letter of the following
$1 # content of group 1
\E # stop the uppercased
$2 # content of group 2
$3 # content of group 3
Screenshot (before):
Screenshot (after):
if the format is always in the form:
journal={Macromolecular chemistry and physics},
i.e. journal followed by 3 words then use the following:
Find: journal={(\w+)\s*(\w+)\s*(\w+)\s*(\w+)
Replace with: journal={\u\1 \u\2 \l\3 \u\4
You can modify that if you have more words to replace by adding more \u\x, where x is the position of the word.
Hope it helps to give you an idea to move forward for a better solution.
\u translates the next letter to uppercase (used for all other words)
\l translates the next letter to lowercase (used for the word "and")
\1 replaces the 1st captured () search group
\2 replaces the 2nd captured () search group
\3 replaces the 3rd captured () search group

Regex for find and replace in N++ of number followed by anything

I have a text file with the following lines:
asm-java-2.0.0-lib
cib-slides-3.1.0
lib-hibernate-common-4.0.0-beta
astp
act4lib-4.0.0
I want to remove everything from, including the '-' before the numbers begin so the results look like:
2.0.0-lib
3.1.0
4.0.0-beta
act4lib
Does anyone know the correct regex for this? So far I have come up with -\D.*(a-z)* but its got too many errors.
Ctrl+H
Find what: ^.*?(?=\d|$)
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
.*? # 0 or more any character but newline, not greedy
(?= # start lookahead, zero-length assertion that makes sure we have after
\d # a digit
| # OR
$ # end of line
) # end lookahead
Result for given example:
2.0.0-lib
3.1.0
4.0.0-beta
Another solution that deals with act4lib-4.0.0:
Ctrl+H
Find what: ^(?:.*-(?=\d)|\D+)
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
(?: # start non capture group
.* # 0 or more any character but newline
- # a dash
(?=\d) # lookahead, zero-length assertion that makes sure we have a digit after
| # OR
\D+ # 1 or more non digit
) # end group
Replacement:
\t # a tabulation, you may replace with what you want
Given:
asm-java-2.0.0-lib
cib-slides-3.1.0
lib-hibernate-common-4.0.0-beta
astp
act4lib-4.0.0
Result for given example:
2.0.0-lib
3.1.0
4.0.0-beta
4.0.0
Use
^\D+\-
If you want to completely remove lines without numbers then use this
^\D+(\-|$)
In case the packages contain numbers in their names like act4lib-4.0.0 then a longer variant is needed
^[\w-]+(\-(?=\d+\.\d+)|$)
It can be shortened to ^.+?(\-(?=\d+\.)|$) but I just want to be sure so I also check the minor version number
The ^ will match from the start of line

Remove the text outside the first brackets in R

I know that it was asked a lot of times, but I've tried to adapt the other answers to my need and I was not able to make it work using SKIP and FAIL (I'm a bit confused, I've to admit)
I'm using R actually.
The url I need to clean is:
url <- "posts.fields(id,from.fields(id,name),message,comments.summary(true).limit(0),likes.summary(true).limit(0))"
and I need to retain only the content inside the first brackets that are always prefixed by the word "fields" (while "posts" may vary). In other words something like
id,from.fields(id,name),message,comments.summary(true).limit(0),likes.summary(true).limit(0)
As you may see there're some nesting inside. But I eventually could change my source code to accept this string too (removing every parhentesis by every prefix)
id,from,message,comments,likes
I don't know on how to remove the trailing parhentesis which balances the first.
If it's good enough to just remove everything up to and including the first open parenthesis and also remove the last close parenthesis and thereafter then:
sub("^.*?\\((.*)\\)[^)]*$", "\\1", url)
Note:
If it's good enough to just remove the first open parenthesis and last close parenthesis then try this:
sub("\\((.*)\\)", "\\1", url)
Using lazy .* instead of greedy:
sub(".*?fields\\((.*)\\)", "\\1", url)
[1] "id,from.fields(id,name),message,comments.summary(true).limit(0),likes.summary(true).limit(0)"
You need to use a recursive pattern:
sub("[^.]*+(?:\\.(?!fields\\()[^.]*)*+\\.fields\\(([^()]*+(?:\\((?1)\\)[^()]*)*+)\\)(?s:.*)", "\\1", url, perl=T)
demo
details:
# reach the dot before "fields("
[^.]*+ # all except a dot (possessive)
(?: # open a non-capturing group
\\. # a literal dot
(?!fields\\() # not followed by "fields("
[^.]* # all except a dot
)*+ # repeat the group zero or more times
\\.fields\\(
# match a content between parenthesis with any level of nesting
( # open the capture group 1
[^()]*+ # 0 or more character that are not brackets (possessive)
(?: # open a non capturing group
\\(
(?1) # recursion in group 1
\\) #
[^()]* # all that is not a bracket
)*+ # close the non capturing group and repeat 0 or more time (possessive)
) # close the capture group 1
\\)
(?s:.*) # end of the string
Possessive quantifiers are used here to limit the backtracking when for any reason a part of the pattern fails.

Can a Regex Return the Number of the Line where the Match is Found?

In a text editor, I want to replace a given word with the number of the line number on which this word is found. Is this is possible with Regex?
Recursion, Self-Referencing Group (Qtax trick), Reverse Qtax or Balancing Groups
Introduction
The idea of adding a list of integers to the bottom of the input is similar to a famous database hack (nothing to do with regex) where one joins to a table of integers. My original answer used the #Qtax trick. The current answers use either Recursion, the Qtax trick (straight or in a reversed variation), or Balancing Groups.
Yes, it is possible... With some caveats and regex trickery.
The solutions in this answer are meant as a vehicle to demonstrate some regex syntax more than practical answers to be implemented.
At the end of your file, we will paste a list of numbers preceded with a unique delimiter. For this experiment, the appended string is :1:2:3:4:5:6:7 This is a similar technique to a famous database hack that uses a table of integers.
For the first two solutions, we need an editor that uses a regex flavor that allows recursion (solution 1) or self-referencing capture groups (solutions 2 and 3). Two come to mind: Notepad++ and EditPad Pro. For the third solution, we need an editor that supports balancing groups. That probably limits us to EditPad Pro or Visual Studio 2013+.
Input file:
Let's say we are searching for pig and want to replace it with the line number.
We'll use this as input:
my cat
dog
my pig
my cow
my mouse
:1:2:3:4:5:6:7
First Solution: Recursion
Supported languages: Apart from the text editors mentioned above (Notepad++ and EditPad Pro), this solution should work in languages that use PCRE (PHP, R, Delphi), in Perl, and in Python using Matthew Barnett's regex module (untested).
The recursive structure lives in a lookahead, and is optional. Its job is to balance lines that don't contain pig, on the left, with numbers, on the right: think of it as balancing a nested construct like {{{ }}}... Except that on the left we have the no-match lines, and on the right we have the numbers. The point is that when we exit the lookahead, we know how many lines were skipped.
Search:
(?sm)(?=.*?pig)(?=((?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?:(?1)|[^:]+)(:\d+))?).*?\Kpig(?=.*?(?(2)\2):(\d+))
Free-Spacing Version with Comments:
(?xsm) # free-spacing mode, multi-line
(?=.*?pig) # fail right away if pig isn't there
(?= # The Recursive Structure Lives In This Lookahead
( # Group 1
(?: # skip one line
^
(?:(?!pig)[^\r\n])* # zero or more chars not followed by pig
(?:\r?\n) # newline chars
)
(?:(?1)|[^:]+) # recurse Group 1 OR match all chars that are not a :
(:\d+) # match digits
)? # End Group
) # End lookahead.
.*?\Kpig # get to pig
(?=.*?(?(2)\2):(\d+)) # Lookahead: capture the next digits
Replace: \3
In the demo, see the substitutions at the bottom. You can play with the letters on the first two lines (delete a space to make pig) to move the first occurrence of pig to a different line, and see how that affects the results.
Second Solution: Group that Refers to Itself ("Qtax Trick")
Supported languages: Apart from the text editors mentioned above (Notepad++ and EditPad Pro), this solution should work in languages that use PCRE (PHP, R, Delphi), in Perl, and in Python using Matthew Barnett's regex module (untested). The solution is easy to adapt to .NET by converting the \K to a lookahead and the possessive quantifier to an atomic group (see the .NET Version a few lines below.)
Search:
(?sm)(?=.*?pig)(?:(?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?=[^:]+((?(1)\1):\d+)))*+.*?\Kpig(?=[^:]+(?(1)\1):(\d+))
.NET version: Back to the Future
.NET does not have \K. It its place, we use a "back to the future" lookbehind (a lookbehind that contains a lookahead that skips ahead of the match). Also, we need to use an atomic group instead of a possessive quantifier.
(?sm)(?<=(?=.*?pig)(?=(?>(?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?=[^:]+((?(1)\1):\d+)))*).*)pig(?=[^:]+(?(1)\1):(\d+))
Free-Spacing Version with Comments (Perl / PCRE Version):
(?xsm) # free-spacing mode, multi-line
(?=.*?pig) # lookahead: if pig is not there, fail right away to save the effort
(?: # start counter-line-skipper (lines that don't include pig)
(?: # skip one line
^ #
(?:(?!pig)[^\r\n])* # zero or more chars not followed by pig
(?:\r?\n) # newline chars
)
# for each line skipped, let Group 1 match an ever increasing portion of the numbers string at the bottom
(?= # lookahead
[^:]+ # skip all chars that are not colons
( # start Group 1
(?(1)\1) # match Group 1 if set
:\d+ # match a colon and some digits
) # end Group 1
) # end lookahead
)*+ # end counter-line-skipper: zero or more times
.*? # match
\K # drop everything we've matched so far
pig # match pig (this is the match!)
(?=[^:]+(?(1)\1):(\d+)) # capture the next number to Group 2
Replace:
\2
Output:
my cat
dog
my 3
my cow
my mouse
:1:2:3:4:5:6:7
In the demo, see the substitutions at the bottom. You can play with the letters on the first two lines (delete a space to make pig) to move the first occurrence of pig to a different line, and see how that affects the results.
Choice of Delimiter for Digits
In our example, the delimiter : for the string of digits is rather common, and could happen elsewhere. We can invent a UNIQUE_DELIMITER and tweak the expression slightly. But the following optimization is even more efficient and lets us keep the :
Optimization on Second Solution: Reverse String of Digits
Instead of pasting our digits in order, it may be to our benefit to use them in the reverse order: :7:6:5:4:3:2:1
In our lookaheads, this allows us to get down to the bottom of the input with a simple .*, and to start backtracking from there. Since we know we're at the end of the string, we don't have to worry about the :digits being part of another section of the string. Here's how to do it.
Input:
my cat pi g
dog p ig
my pig
my cow
my mouse
:7:6:5:4:3:2:1
Search:
(?xsm) # free-spacing mode, multi-line
(?=.*?pig) # lookahead: if pig is not there, fail right away to save the effort
(?: # start counter-line-skipper (lines that don't include pig)
(?: # skip one line that doesn't have pig
^ #
(?:(?!pig)[^\r\n])* # zero or more chars not followed by pig
(?:\r?\n) # newline chars
)
# Group 1 matches increasing portion of the numbers string at the bottom
(?= # lookahead
.* # get to the end of the input
( # start Group 1
:\d+ # match a colon and some digits
(?(1)\1) # match Group 1 if set
) # end Group 1
) # end lookahead
)*+ # end counter-line-skipper: zero or more times
.*? # match
\K # drop match so far
pig # match pig (this is the match!)
(?=.*(\d+)(?(1)\1)) # capture the next number to Group 2
Replace: \2
See the substitutions in the demo.
Third Solution: Balancing Groups
This solution is specific to .NET.
Search:
(?m)(?<=\A(?<c>^(?:(?!pig)[^\r\n])*(?:\r?\n))*.*?)pig(?=[^:]+(?(c)(?<-c>:\d+)*):(\d+))
Free-Spacing Version with Comments:
(?xm) # free-spacing, multi-line
(?<= # lookbehind
\A #
(?<c> # skip one line that doesn't have pig
# The length of Group c Captures will serve as a counter
^ # beginning of line
(?:(?!pig)[^\r\n])* # zero or more chars not followed by pig
(?:\r?\n) # newline chars
) # end skipper
* # repeat skipper
.*? # we're on the pig line: lazily match chars before pig
) # end lookbehind
pig # match pig: this is the match
(?= # lookahead
[^:]+ # get to the digits
(?(c) # if Group c has been set
(?<-c>:\d+) # decrement c while we match a group of digits
* # repeat: this will only repeat as long as the length of Group c captures > 0
) # end if Group c has been set
:(\d+) # Match the next digit group, capture the digits
) # end lokahead
Replace: $1
Reference
Qtax trick
On Which Line Number Was the Regex Match Found?
Because you didn't specify which text editor, in vim it would be:
:%s/searched_word/\=printf('%-4d', line('.'))/g (read more)
But as somebody mentioned it's not a question for SO but rather Super User ;)
I don't know of an editor that does that short of extending an editor that allows arbitrary extensions.
You could easily use perl to do the task, though.
perl -i.bak -e"s/word/$./eg" file
Or if you want to use wildcards,
perl -MFile::DosGlob=glob -i.bak -e"BEGIN { #ARGV = map glob($_), #ARGV } s/word/$./eg" *.txt