Multiline Regular Expression search and replace! - regex

I've hit a wall. Does anybody know a good text editor that has search and replace like Notepad++ but can also do multi-line regex search and replace? Basically, I am trying to find something that can match a regex like:
search oldlog\(.*\n\s+([\r\n.]*)\);replace newlog\(\1\)
Any ideas?

Notepad++ can now handle multi line regular expressions (just update to the latest version - feature was introduced around March '12).
I needed to remove all onmouseout and onmouseover statements from an HTML document and I needed to create a non-greedy multi line match.
onmouseover=.?\s*".*?"
Make sure you check the: [ ] . matches newline checkbox if you want to use the multi line match capability.

EditPad Pro has better regex capabilities than any other editor I've ever used.
Also, I suspect you have an error in your regex — [\r\n.] will match only carriage returns, newlines, and full stops. If you're trying to match any character (i.e. "dot operator plus CR and LF), try [\s\S] instead.

My personal recommendation is IDM Computing's UltraEdit (www.ultraedit.com) - it can do regular expressions (both search and replace) with Perl, Unix and UltraEdit syntax. Multi-line matching is one of the capabilities in Perl regex mode in it.
It also has other nice search capabilities (e.g search in specific character column range, search in multiple files, search history, search favorites, etc...)
(source: ultraedit.com)

The Zeus editor can do multi-line search and replace.

I use Eclipse, which is free and that you may already have if you are a developer. '\R' acts as platform independent line delimiter. Here is an example of multi-line search:
search:
\bibitem.(\R.)?\R?{([^{])}$\R^([^\].[^}]$\R.$\R.)
and replace:
\defcitealias{$2}{$3}

I'm pretty sure Notepad++ can do that now via the TextFX plugin (which is included by default). Hit Control-R in Notepad++ and have a play.

TextPad has good Regex search and replace capabilities; I've used it for a while and am pretty happy with it.
From the Features:
Powerful search/replace engine using
UNIX-style regular expressions, with
the power of editor macros. Sets of
files in a directory tree can be
searched, and text can be replaced in
all open documents at once.
For more options than you could possibly need, check out "Notepad++ Alternatives" at AlternativeTo.net.

you can use Python Script plugin for Multiline Regular Expression search and replace!
- http://npppythonscript.sourceforge.net/docs/latest/scintilla.html?highlight=pymlreplace#Editor.pymlreplace
# This example replaces any <br/> that is followed by another on the next line (with optional spaces in between), with a single one
editor.pymlreplace(r"<br/>\s*\r\n\s*<br/>", "<br/>\r\n")

I use Notepad++ all the time but it's Regex has alway been a bit lacking.
Sublime Text is what you want.

EditPlus does a good job at search/replace using regex (including multiline)

You could use Visual Studio. Download Express for free if you don't have a copy.
VS's regex is non-standard, so you'd have to use \n:b+[\r\n] instead.

The latest version of UltraEdit has multiline find and replace w/ regex support.
Or if you're OK with using a more specialized regular expression tool for this, there's Regex Hero. It has the side benefit of being able to do everything on the fly. In other words, you don't have to click a button to test your regular expression because it's automatically tested after every keypress.
Personally, I'd use UltraEdit if I'm looking to replace text in multiple files. That way I can just select the files to replace as a batch and click Replace. But if I'm working with a single text file and I'm in need of writing a more complex regular expression then I'd paste it into Regex Hero and work with it there. That's because Regex Hero can save time when you see everything happen in real-time.

ED for windows has two versions of regex, three sorts of cut and paste (selection, lines or blocks, AND you can shift from one to the next (unlike ultra edit, which is clunky at best) with just a mouse click while you are highlighting -- no need to pull down a menu. The sheer speed of getting the job done is incredible, like reading on a Kindle, you don't have to think about it.

You can use a recent version of Notepad++ (Mine is 6.2.2).
No need to use the option ". match newline" as suggested in another answer. Instead, use the adequate regular expression with ^ for "begin of line" and $ for "end of line". Then use \r\n after the $ for a "new line" in a dos file (or just \n in a unix file as the carriage return is mainly used for dos/windows text file):
Ex.: to remove all lines starting with tags OBJE after a line starting with a tag UID (from a gedcom file - used in genealogy), I did use the following search regex:
^UID (.*)$\r\n^(OBJE (.*)$\r\n)+
And the following replace value:
UID \1\r\n
This is matching lines like this:
UID 4FBB852FB485B2A64DE276675D57A1BA
OBJE #M4#
OBJE #M3#
OBJE #M2#
OBJE #M1#
and the output of the replacement is
UID 4FBB852FB485B2A64DE276675D57A1BA
550 instances have been replaced in less than 1 sec. Notepad++ is really efficient!
Otherwise, to validate a Regular expression I like to use the .Net RegEx Tester (http://regexhero.net/tester/). It's really great to write and test on the fly a Reg Ex...
PS.: Also, you can use [\s\S] in your regex to match any character including new lines. So, if you look for any block of "multi-line" text starting with "xxx" and ending with "abc", the following Regex will be fine:^xxx[\s\S]*?abc$ where "*?" is to match as less as possible between xxx and abc !!!

Related

Regex in search & replace: avoid fixed length of lookaround

In a long corpus of text, I want to make some corrections in certain
environments. However, I am encountering problems when using regex with text
editors. I switched to gedit to have an editor which supports regex in
search & replace.
Crucially, I only want to make changes if the line starts with a certain
pattern (\nm or \mb). The problem is that the element that I want to
replace (o' -> o'o) is not at a fixed length from the beginning of the line
and I can't include the regex in the lookbehind (the lookbehind fails).
Is there any way to include what I am looking for in a simple text editor
regex? Or is this already a step where I have to learn how to script in, for
example, Python?
This is what the regex looks like so far.
(?<=\\(nm|mb)).*o'(?=(q|w|r|t|z|p|s|d|f|g|h|j|k|l|x|c|v|b|n|m|a|i|u|e))
Of course, I can't apply .* in the replace without losing its content.
Put a capture group around .* and a back-reference in the replacement.
Find: (?<=\\(nm|mb))(.*)o'(?=(q|w|r|t|z|p|s|d|f|g|h|j|k|l|x|c|v|b|n|m|a|i|u|e))
Replace: \1o'o

Regex and replace selected results only

I want to know whether there is any tool that does a regex search across huge text (xml or tagged or html) and replace only the cases which are selected across shown (should have 'select/deselect/select all' options while applying the replace regex).
Like the below example:
My content is:
"Visited xtreme.com, stupid.net, childish.com, happy.net and innocence.edu. There are some cross.network isssues that are to be fixed."
Now in this content, i want to replace all ".net" occurrences with ".com" and so a simple tool like notepad++ would replace it easily. But i want the tool to show the search results and give the option to replace only first two occurrences of ".net" and not the instance in "cross.network"
This is only example purposes only and don't suggest an alternate regex. I don't need it.
NP++ or sublime are all fine, as long as they can read all the text to the memory. They both support regexes for finding and replacing text.
If the text files are too big, i.e. NP++ crashes, then you can use sed. It's a command line tool that can replace text like this:
sed -i filename.txt 's/pattern/replacement/g'
On windows boxes you need mingw or cygwin to run it.
Use a text editor like sublime and apply a word boundary to the regular expression:
\.net\b
This will find .net in stupid.net but not in cross.network.
See a demo on regex101.com.

Notepad++ Regex inverse match

I'm new to Regex and trying to figure out how to remove all text from file open in Notepad++ that does not match #LCxxxx or #LAxxxx. Example below (text wanting to keep in bold):
1.In rare cases, reinstalling this MSP file can cause the Citrix Display Driver.....
[From ICAWS760WX86][#0528688]
30.This release includes an enhancement...
[From ICAWS760WX86022][#LA3014]
New Fixes in This Release
1.Windows Server 2008 R2 and Windows Server 2012 R2,...
[From ICAWS760WX86026][#LC2179]
Fixes from Replaced Hotfixes
1.If the Windows Remote Desktop Session Host....
[From ICAWS760WX86004][#LC1180]
I think this is what you're looking for:
(?:[\S\s]*?)(\#L[AC]\d{4})(?:.*)
Replace with:
$1\n
You could do a regular expression search and replace, searching for
(#L[AC]....)
where "dot matches newline" is NOT selected. Replace with
\r\n\1\r\n
That will put all the wanted pieces of text on a line on their own.
Next use the "Mark" tab in the find window. Select "Bookmark line", use the same search string as above (the capture brackets are not needed this time, but they are harmless and so can be left), and them click "Mark all". Now all the wanted lines are bookmarked. Use menu => Search => Bookmark => Remove unmarked lines.
There may be a way of doing it all in one go, but that would be a complex regular expression. The method above uses two simple steps.
remove all text from file open in Notepad++ that does not match #LCxxxx or #LAxxxx
^.*(\[#L[CA]\d+\])$|^.*$
DESCRIPTION
DEMO
https://regex101.com/r/hO1aL8/2
Notepad++
Do a search and replace like describe in the screenshot below:
Alternatively, if you want to get rid off the empty lines during the replace operation, use the regular expression below:
^[\S\s]+?(\[#L[CA]\d+\])$
\s : Whitespaces (\t,\r,\n ...)
\S : Any character except whitespaces.
Tested on Notepad 6.6.9

A regular expression that matches two long strings and ignores everything in between

I am searching through a 1.5 million line Premiere Pro project for any text that matches one of my audio filters and is set to mono.
Text that I am searching for begins with the <ChannelType> tag and ends with the <FilterMatchName>Tags. So it would looks like this
<ChannelType>0</ChannelType>
<FrameRate>5292000</FrameRate>
</AudioComponent>
<FilterPreset>0</FilterPreset>
<OpaqueData Encoding="base64" Checksum="53060659">AAAAAD8L8lo+AUr+Pac1NjwTmoUAAAAAP0uQDD37nIg9ui6MPjwU5j+AAAA+C/JaAAAAAD8qqqsAAAAAP4AAAD92L8w9py8FAAAAAHNvZnQgY29tcHJlc3Npb24AIiBkZWZhdWx0PSIwIiBzdGVwPSIxIiBtaW49IjAiIG1heD0iMSIvPgoJICA8Zmw=</OpaqueData>
<FilterIndex>-1</FilterIndex>
<FilterMatchName>1094998321 Dynamics1</FilterMatchName>
If I were in a Word doc, I would just do a find as
<ChannelType>0</ChannelType>*<FilterMatchName>1094998321 Dynamics1</FilterMatchName>
I am terrible with Regex. I was hoping someone could help me out. Everything I have tried either doesn't match anything, or matches EVERYTHING in the document. I am using Notepad++.
Since you are working in Notepad++, you have access to PCRE regular expressions. This one will get all the text between <ChannelType> and </FilterMatchName>
(?s)<ChannelType>.*?</FilterMatchName>
the (?s) allows the . to match newline characters
After matching <ChannelType>, the .*? lazily matches all characters up to...
the closing </FilterMatchName>, which we match.
Let me know if you have any questions. :)
What type of regular expressions are you using (which language/library)?
Basically you can use .* instead of * in regular expressions. IF your text is long though, it's better to use a Reluctant quantifier[1] if your re implementation allows it.
This is a good site with comparison of different re implementations and tutorials:
http://www.regular-expressions.info
[1] http://docs.oracle.com/javase/tutorial/essential/regex/quant.html

Replacing char in a String with Regular Expression

I got a string like this:
PREFIX-('STRING WITH SPACES TO REPLACE')
and i need this:
PREFIX-('STRING_WITH_SPACES_TO_REPLACE')
I'm using Notepad++ for the Regex Search and Replace, but i'm shure every other Editor capable of regex replacements can do it to.
I'm using:
PREFIX-\('(.*)(\s)(.*)'\)
for search and
PREFIX-('\1_\3')
for replace
but that replaces only one space from the string.
The regex search feature in Notepad++ is very, very weak. The only way I can see to do this in NPP is to manually select the part of the text you want to work on, then do a standard find/replace with the In selection box checked.
Alternatively, you can run the document through an external script, or you can get a better editor. EditPad Pro has the best regex support I've ever seen in an editor. It's not free, but it's worth paying for. In EPP all I had to do was this:
search: ((?:PREFIX-\('|\G)[^\s']+)\s+
replace: $1_
EDIT: \G matches the position where the previous match ended, or the beginning of the input if there was no previous match. In other words, the first time you apply the regex, \G acts like \A. You can prevent that by adding a negative lookahead, like so:
((?:PREFIX-\('|(?!\A)\G)[^\s']+)\s+
If you want to prevent a match at the very beginning of the text no matter what it starts with, you can move the lookahead outside the group:
(?!\A)((?:PREFIX-\('|\G)[^\s']+)\s+
And, just in case you were wondering, a lookbehind will work just as well as a lookahead:
((?:PREFIX-\('|(?<!\A)\G)[^\s']+)\s+
You have to keep matching from the beggining of the string untill you can match no more.
find /(PREFIX-\('[^\s']*)\s([^']*'\))/
replace $1_$2
like: while (/(PREFIX-\('[^\s']*)\s([^']*'\))/$1_$2/) {}
How about using Replace all for about 20 times? Or until you're sure no string contains more spaces
Due to nature of regex, it's not possible to do this in one step by normal regular expression.
But if I be in your place, I do such replaces in several steps:
find such patterns and mark them with special character
(Like replacing STRING WITH SPACES TO REPLACE with #STRING WITH SPACES TO REPLACE#
Replace #([^#\s]*)\s to #\1_ server times.
Remove markers!
I studied a little the regex tool in Notepad++ because I didn't know their possibilities.
I conclude that they aren't powerful enough to do what you want.
Your are obliged to learn and use a programming language having a real regex capability. There are a number of them. Personnaly, I use Python. It would take 1 mn to do what you want with it
You'd have to run the replace several times for each space but this regex will work
/(?<=PREFIX-\(')([^\s]+)\s+/g
Replace with
\1_ or $1_
See it working at http://refiddle.com/10z