Regex replace from | [duplicate] - regex

How can I replace several different words all at once in Notepad++?
For example;
I have "good", "great" and "fine" and I want to replace them with "bad", "worse" and "not", respectively, all at once.
I know that I can replace them one by one, but the problem I am facing requires that I replace a lot of words, which is not convenient to do.

Try a regular expression replace of (good)|(great)|(fine) with (?1bad)(?2worse)(?3not).
The search looks for either of three alternatives separated by the |. Each alternative has ist own capture brackets. The replace uses the conditional form ?Ntrue-expression:false-expression where N is decimal digit, the clause checks whether capture expression N matches.
Tested in Notepad++ 6.3
Update:
You can find good documentation, about the new PRCE Regular
Expressions, used by N++, since the 6.0 version, at the TWO addresses
below :
http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html
The FIRST one concerns the syntax of regular expressions in SEARCH
The SECOND one concerns the syntax of regular expressions in
REPLACEMENT
And, if you can understand "written French", I made a tutorial about
PCRE regular expressions, stored in the personal site of Christian
Cuvier (cchris), at the address below :
http://oedoc.free.fr/Regex/TutorielRegex.zip
(Extracted from a posting by THEVENOT Guy at http://sourceforge.net/p/notepad-plus/discussion/331754/thread/ca059a0a/ )

Install Python Script plugin from Plugin Manager.
Create a file with your substitutions (e.g., C:/Temp/Substitutions.txt), separate values with space:
good bad
great worse
fine not
Create a new script:
with open('C:/Temp/Substitutions.txt') as f:
for l in f:
s = l.split()
editor.replace(s[0], s[1])
Run the new script against the text you want to substitute.

I needed to run the substitution on several files. Based on Mauricio Morales's answer, I created the following script.
with open('C:/Temp/Substitutions.txt') as f:
files = notepad.getFiles()
for file in files:
notepad.activateFile(file[0])
for l in f:
s = l.split()
editor.replace(s[0], s[1])
f.seek(0) # Reset file input stream

If you're replacing the same words in several different files all the time, recording your action once using these buttons and saving it as a macro will be helpful. *Notepad++

Related

regular expression matching filename with multiple extensions

Is there a regular expression to match the some.prefix part of both of the following filenames?
xyz can be any character of [a-z0-9-_\ ]
some.prefix part can be any character in [a-zA-Z0-9-_\.\ ].
I intentionally included a . in some.prefix.
some.prefix.xyz.xyz
some.prefix.xyz
I have tried many combinations. For example:
(?P<prefix>[a-zA-Z0-9-_\.]+)(?:\.[a-z0-9]+\.gz|\.[a-z0-9]+)
It works with abc.def.csv by catching abc.def, but fail to catch it in abc.def.csv.gz.
I primarily use Python, but I thought the regex itself should apply to many languages.
Update: It's not possible, see discussion with #nowox below.
I think your regex works pretty well. I recommend you to trying regex101 with your example:
https://regex101.com/r/dV6cE8/3
The expression
^(?i)[ \w-]+\.[ \w-]+
Should work in your case:
som e.prefix.xyz.xyz
^^^^^^^^^^^
some.prefix.xyz
^^^^^^^^^^^
abc.def.csv.gz
^^^^^^^
And in Python you can use:
import re
text = """some.prefix.xyz.xyz
some.prefix.xyz
abc.def.csv.gz"""
print re.findall('^(?i)[ \w-]+\.[ \w-]+', text, re.MULTILINE)
Which will display:
['som e.prefix', 'some.prefix', 'abc.def']
I might think you are a bit confused about your requirement. If I summarize, you have a pathname made of chars and dot such as:
foo.bar.baz.0
foobar.tar.gz
f.o.o.b.a.r
How would you separate these string into a base-name and an extension? Here we recognize some known patterns .tar.gz is definitely an extension, but is .bar.baz.0 the extension or it is only .0?
The answer is not easy and no regexes in this World would be able to guess the correct answer at 100% without some hints.
For example you can list the acceptable extensions and make some criteria:
An extension match the regex \.\w{1,4}$
Several extensions may be concatenated together (\.\w{1,4}){1,4}$
The remaining is called the basename
From this you can build this regular expression:
(?P<basename>.*?)(?P<extension>(?:\.\w{1,4}){1,4})$
Try this[a-z0-9-_\\]+\.[a-z0-9-_\\]+[a-zA-Z0-9-_\.\\]+

Adding up several Search and Replace regular expressions

I am new to regular expression. I want to know is there any way to batch up many 'find and replace' regular expressions together and is there any specific tool which could make this task easy?
In details-
I mean- Find one regular expression and replace with other regular expression, then find another regular expression and replace it with another different regular expression, then find third and replace it with some another, so on .. so on .. may be upto 20 search and replace. And in automated way as compared to manually doing search and replace singly upto 20 times.
Chaining Replacements
You can chain replacements in any language that gives you access to a regex engine.
Python and PHP are good choices if you are starting out and want to do a bit of scripting
Any of the .NET languages, Java, Ruby, Perl... You name it.
In Java
In the comments, you mention that you use Java. To chain replacements, you can do things like this:
String result1 = subjectString.replaceAll(myregex, myreplacement);
String result2 = result1.replaceAll(myregex2, myreplacement2);
String result3 = result2.replaceAll(myregex3, myreplacement3);
GUI Tools
I can think of three GUI tools that allow regex chaining:
PowerGrep (commercial, by Jan Goyvaerts, the author of the famous RegexBuddy)
TextDistil (free at the moment, .NET regex flavor)
TextPipe Pro (commercial)
In addition, regex chaining is available in applications with a narrow focus, for instance:
Directory Opus (powerful File Manager for Windows)
A Better Finder Rename and Name Mangler (file renamers for OSX)
In PHP you can do this with preg_replace(). If the pattern and replacement arguments are both arrays, each regexp in the pattern argument will be replaced with the corresponding element of the replacement argument.

Regexp-replace: Multiple replacements within a match

I'm converting our MVC3 project to use T4MVC. And I would like to replace java-script includes to work with T4MVC as well. So I need to replace
"~/Scripts/DataTables/TableTools/TableTools.min.js"
"~/Scripts/jquery-ui-1.8.24.min.js"
Into
Scripts.DataTables.TableTools.TableTools_min_js
Scripts.jquery_ui_1_8_24_min_js
I'm using Notepad++ as a regexp tool at the moment, and it is using POSIX regexps.
I can find script name and replace it with these regexps:
Find: \("~/Scripts/(.*)"\)
Replace with \(Scripts.\1\)
But I can't figure out how do I replace dots and dashes in the file names into underscores and replace forward slashes into dots.
I can check that js-filename have dot or dash in a name with this
\("~/Scripts/(?=\.*)(?=\-*).*"\)
But how do I replace groups within a group?
Need to have non-greedy replacement within group, and have these replacements going in an order, so forward slashes converted into a dot will not be converted to underscore afterwards.
This is a non-critical problem, I've already done all the replacements manually, but I thought I'm good with regexp, so this problem bugs me!!
p.s. preferred tool is Notepad++, but any POSIX regexp solution would do -)
p.p.s. Here you can get a sample of stuff to be replaced
And here is the the target text
I would just use a site like RegexHero
You can past the code into the target string box, then place (?<=(~/Script).*)[.-](?=(.*"[)]")) into the Regular Expression box, with _ in the Replacement String box.
Once the replace is done, click on Final String at the bottom, and select Move to target string and start a new expression.
From there, Paste (?<=(<script).*)("~/)(?=(.*[)]" ))|(?<=(Url.).*)(")(?=(.*(\)" ))) into the Regular Expression box and leave the Replacement String box empty.
Once the replace is done, click on Final String at the bottom, and select Move to target string and start a new expression.
From there paste (?<=(Script).*)[/](?=(.*[)]")) into the Regular Expression box and . into the Replacement String box.
After that, the Final String box will have what you are looking for. I'm not sure the upper limits of how much text you can parse, but it could be broken up if that's an issue. I'm sure there might be better ways to do it, but this tends to be the way I go about things like this. One reason I like this site, is because I don't have to install anything, so I can do it anywhere quickly.
Edit 1: Per the comments, I have moved step 3 to Step 5 and added new steps 3 and 4. I had to do it this way, because new Step 5 would have replaced the / in "~/Scripts with a ., breaking the removal of "~/. I also had to change Step 5's code to account for the changed beginning of Script
Here is a vanilla Notepad++ solution, but it's certainly not the most elegant one. I managed to do the transformation with several passes over the file.
First pass
Replace . and - with _.
Find: ("~/Scripts[^"]*?)[.-]
Replace With: \1_
Unfortunately, I could not find a way to match only the . or -, because it would require a lookbehind, which is apparently not supported by Notepad++. Due to this, every time you execute the replacement only the first . or - in a script name will be replaced (because matches cannot overlap). Hence, you have to run this replacement multiple times until no more replacements are done (in your example input, that would be 8 times).
Second pass
Replace / with ..
Find: ("~/Scripts[^"]*?)/
Replace with: \1.
This is basically the same thing as the first pass, just with different characters (you will have to this 3 times for the example file). Doing the passes in this order ensures that no slashes will end up as underscores.
Third pass
Remove the surrounding characters.
Find: "~/(Scripts[^"]*?)"
Replace with: \1
This will now match all the script names that are still surrounded by "~/ and ", capturing what is in between and just outputting that.
Note that by including those surrounding characters in the find patterns of the first two passes, you can avoid converting the . in strings that are already of the new format.
As I said this is not the most convenient way to do it. Especially, since passes one and two have to be executed manually multiple times. But it would still save a lot of time for large files, and I cannot think of a way to get all of them - only in the correct strings - in one pass, without lookbehind capabilities. Of course, I would very much welcome suggestions to improve this solution :). I hope I could at least give you (and anyone with a similar problem) a starting point.
If, as your question indicates, you'd like to use N++ then use N++ Python Script. Setup the script and assign a shortcut key, then you have a single pass solution requiring only to open, modify, and save... can't get much simpler than that.
I think part of the problem is that N++ is not a regex tool and the use of a dedicated regex tool
, or even a search/replace solution, is sometimes warranted. You may be better off, both in speed and in time value using a tool made for text processing vs editing.
[Script Edit]:: Altered to match the modified in/out expectations.
# Substitute & Replace within matched group.
from Npp import *
import re
def repl(m):
return "(Scripts." + re.sub( "[-.]", "_", m.group(1) ).replace( "/", "." ) + ")"
editor.pyreplace( '(?:[(].*?Scripts.)(.*?)(?:"?[)])', repl )
Install:: Plugins -> Plugin Manager -> Python Script
New Script:: Plugins -> Python Script -> script-name.py
Select target tab.
Run:: Plugins -> Python Script -> Scripts -> script-name
[Edit: An extended one-liner PythonScript command]
Having need for the new regex module for Python (that I hope replaces re) I played around and compiled it for use with the N++ PythonScript plugin and decided to test it on your sample set.
Two commands on the console ended up with the correct results in the editor.
import regex as re
editor.setText( (re.compile( r'(?<=.*Content[(].*)((?<omit>["~]+?([~])[/]|["])|(?<toUnderscore>[-.]+)|(?<toDot>[/]+))+(?=.*[)]".*)' ) ).sub(lambda m: {'omit':'','toDot':'.','toUnderscore':'_'}[[ key for key, value in m.groupdict().items() if value != None ][0]], editor.getText() ) )
Very sweet!
What else is really cool about using regex instead of re was that I was able to build the expression in Expresso and use it as is! Which allows for a verbose explanation of it, just by copy-paste of the r'' string portion into Expresso.
The abbreviated text of which is::
Match a prefix but exclude it from the capture. [.*Content[(].*]
[1]: A numbered capture group. [(?<omit>["~]+?([~])[/]|["])|(?<toUnderscore>[-.]+)|(?<toDot>[/]+)], one or more repetitions
Select from 3 alternatives
[omit]: A named capture group. [["~]+?([~])[/]|["]]
Select from 2 alternatives
["~]+?([~])[/]
Any character in this class: ["]
[toUnderscore]: A named capture group. [[-.]+]
[toDot]: A named capture group. [[/]+]
Match a suffix but exclude it from the capture. [.*[)]".*]
The command breakdown is fairly nifty, we are telling Scintilla to set the full buffer contents to the results of a compiled regex substitution command by essentially using a 'switch' off of the name of the group that isn't empty.
Hopefully Dave (the PythonScript Author) will add the regex module to the ExtraPythonLibs part of the project.
Alternatively you could use a script that would do it and avoid copy pasting and the rest of the manual labor altogether. Consider using the following script:
$_.gsub!(%r{(?:"~/)?Scripts/([a-z0-9./-]+)"?}i) do |i|
'Scripts.' + $1.split('/').map { |i| i.gsub(/[.-]/, '_') }.join('.')
end
And run it like this:
$ ruby -pi.bak script.rb *.ext
All the files with extension .ext will be edited in-place and the original files will be saved with .ext.bak extension. If you use revision control (and you should) then you can easily review changes with some visual diff tool, correct them if necessary and commit them afterwards.

Do calculation on captured number in regex before using it in replacement

Using a regex, I am able to find a bunch of numbers that I want to replace. However, I want to replace the number with another number that is calculated using the original - captured - number.
Is that possible in notepad++ using a kind of expression in the replacement-part?
Edit: Maybe a strange thought, but could the calculation be done in the search part, generating a second captured number that would effectively be the result?
Even if it is possible, it will almost certainly be "messy" - why not do the replacements with a simple script instead? For example..
#!/usr/bin/env ruby
f = File.new("f1.txt", File::RDWR)
contents = f.read()
contents.gsub!(/\d+/){|m|
m.to_i + 1 # convert the current match to an integer, and add one
}
f.truncate(0) # empty the existing file
f.seek(0) # seek to the start of the file, before writing again
f.write(contents) # write modified file
f.close()
..and the output:
$ cat f1.txt
This was one: 1
This two two: 2
$ ruby replacer.rb
$ cat f1.txt
This was one: 2
This two two: 3
In reply to jeroen's comment,
I was actually interested if the possibility existed in the regular expression itself as they are so widespread
A regular expression is really just a simple pattern matching syntax. To do anything more advanced than search/replace with the matches would be up to the text-editors, but the usefulness of this is very limited, and can be achieved via scripting most editors allow (Notepad++ has a plugin system, although I've no idea how easy it is to use).
Basically, if regex/search-and-replace will not achieve what you want, I would say either use your editors scripting ability or use an external script.
Is that possible in notepad++ using a kind of expression in the replacement-part?
Interpolated evaluation of regular-expression matches is a relatively advanced feature that I probably would not expect to find in a general-purpose text editing application. I played around with Notepad++ a bit but was unable to get this to work, nor could I find anything in the documentation that suggests this is possible.
Hmmm... I'd have to recommend AWK to do this.
http://en.wikipedia.org/wiki/AWK
notepad++ has limited regular expressions built in. There are extensions that add a bit more to the regular expression find and replace, but I've found those hard to use. I would recommend writing a little external program to do it for you. Either Ruby, Perl or Python would be great for it. If you know those languages. I use Ruby and have had lots of success with it.

Regex to change to sentence case

I'm using Notepad++ to do some text replacement in a 5453-row language file. The format of the file's rows is:
variable.name = Variable Value Over Here, that''s for sure, Really
Double apostrophe is intentional.
I need to convert the value to sentence case, except for the words "Here" and "Really" which are proper and should remain capitalized. As you can see, the case within the value is typically mixed to begin with.
I've worked on this for a little while. All I've got so far is:
(. )([A-Z])(.+)
which seems to at least select the proper strings. The replacement piece is where I'm struggling.
Find: (. )([A-Z])(.+)
Replace: \1\U\2\L\3
In Notepad++ 6.0 or better (which comes with built-in PCRE support).
Regex replacement cannot execute function (like capitalization) on matches. You'd have to script that, e.g. in PHP or JavaScript.
Update: See Jonas' answer.
I built myself a Web page called Text Utilities to do that sort of things:
paste your text
go in "Find, regexp & replace" (or press Ctrl+Shift+F)
enter your regex (mine would be ^(.*?\=\s*\w)(.*)$)
check the "^$ match line limits" option
choose "Apply JS function to matches"
add arguments (first is the match, then sub patterns), here s, start, rest
change the return statement to return start + rest.toLowerCase();
The final function in the text area looks like this:
return function (s, start, rest) {
return start + rest.toLowerCase();
};
Maybe add some code to capitalize some words like "Really" and "Here".
In Notepad++ you can use a plugin called PythonScript to do the job. If you install the plugin, create a new script like so:
Then you can use the following script, replacing the regex and function variables as you see fit:
import re
#change these
regex = r"[a-z]+sym"
function = str.upper
def perLine(line, num, total):
for match in re.finditer(regex, line):
if match:
s, e = match.start(), match.end()
line = line[:s] + function(line[s:e]) + line[e:]
editor.replaceWholeLine(num, line)
editor.forEachLine(perLine)
This particular example works by finding all the matches in a particular line, then applying the function each each match. If you need multiline support, the Python Script "Conext-Help" explains all the functions offered including pymlsearch/pymlreplace functions defined under the 'editor' object.
When you're ready to run your script, go to the file you want it to run on first, then go to "Scripts >" in the Python Script menu and run yours.
Note: while you will probably be able to use notepad++'s undo functionality if you mess up, it might be a good idea to put the text in another file first to verify it works.
P.S. You can 'find' and 'mark' every occurrence of a regular expression using notepad++'s built-in find dialog, and if you could select them all you could use TextFX's "Characters->UPPER CASE" functionality for this particular problem, but I'm not sure how to go from marked or found text to selected text. But, I thought I would post this in case anyone does...
Edit: In Notepad++ 6.0 or higher, you can use "PCRE (Perl Compatible Regular Expression) Search/Replace" (source: http://sourceforge.net/apps/mediawiki/notepad-plus/?title=Regular_Expressions) So this could have been solved using a regex like (. )([A-z])(.+) with a replacement argument like \1\U\2\3.
The questioner had a very specific case in mind.
As a general "change to sentence case" in notepad++
the first regexp suggestion did not work properly for me.
while not perfect, here is a tweaked version which
was a big improvement on the original for my purposes :
find: ([\.\r\n][ ]*)([A-Za-z\r])([^\.^\r^\n]+)
replace: \1\U\2\L\3
You still have a problem with lower case nouns, names, dates, countries etc. but a good spellchecker can help with that.