Regex-based matching and sustitution with nano? - regex

I am aware of nano's search and replace functionality, but is it capable of using regular expressions for matching and substitution (particularly substitutions that use a part of the match)? If so, can you provide some examples of the syntax used (both for matching and replacing)?
I cut my teeth on Perl-style regular expressions, but I've found that text editors will sometimes come up with their own syntax.

My version of nano has an option to swtich to regex search with the meta character + R. In cygwin on Windows, the meta-key is alt, so I hit ctrl+\ to get into search-and-replace mode, and then alt+r to swtich to regex search.

You need to add, or un-comment, the following entry in your global nanorc file (on my machine, it was /etc/nanorc):
set regexp
Then fire up a new terminal and press CTRL + / and do your replacements which should now be regex-aware.
EDIT
Search for conf->(\S+):
Replace with \1_conf
Press a to replace all occurrences:
End result:

The regular expression format / notation for nano use "Extended Regular Expression", i.e. POSIX Extended Regular Expression, which is used by egrep and sed -r, this include metacharacters ., [ and ], ^, $, (, ), \1 to \9, *, { and }, ?, +, |, and character classes like [:alnum:], [:alpha:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:].
For more complete documentation you can see manual page, man 7 regex in Linux or man 7 re_format in OS X. This page may give you same information as well: https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended
Unfortunately in nano there seems to be no way to match anything that span across multiple lines.

This is a bit old, just updating the search index.
Nano 5.5 uses the ASCII column from this same table.
Thanks to #S P Arif Sahari Wibowo ,
I found the answer here anyway (same wiki link):
https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended

I was recently faced with the problem of inserting text at the beginning of everyline that started with a numerical digit. For that the only way to distinguish this from text i didn't want to change was the previous new line.
Playing around with the information provided in this answer I was able to do it and decided to add it to the answer in case somebody else faces the same situation.
To search for the beginning of the line followed by a number and then insert "Text String" at the beginning of each line that starts with a number:
\ then "(^[0-9])" press carry return, then: "Text String 1" press carry return and the select yes, if it does what you want next press a for all. Omit the " quotation marks.

Related

Remove Word smart quotes from a text file using vim

I have a large text file, originally generated in Microsoft Word, that contains these four character sequences, alongside regular text:
?~#~\
?~#~]
?~#~X
?~#~Y
From the content of what is written in the file, it appears that the sequences respectively correspond to open double quotes, close double quotes, open single quote, and close single quote. When displayed in Vim, everything in the sequences other than the question mark appears in blue.
I cannot remove them with a command such as
:.,$s/?~#~Y//
This command results in the following error from vim:
E33: No previous substitute regular expression
E476: Invalid command
Press ENTER or type command to continue
These commands also produce errors:
:.,$s/\?~#~Y//
:.,$s/\?\~\#\~Y//
Specifically,
E866: (NFA regexp) Misplaced ?
E476: Invalid command
Press ENTER or type command to continue
What would be the correct way to automatically remove or replace the sequences? Ideally, I'd like to remove the double quotes, and replace the open/close single quotes with a traditional single quote or apostrophe.
Since "everything in the sequences other than the question mark appears in blue", all characters except the question mark are probably binary characters. I'd suggest this approach:
go to the first sequence and yank it: press v to start marking, extend the mark to the end of the sequence, then press y
paste the sequence as the replace pattern from the unnamed register: :%s/Ctrl-r"//gEnter
repeat for the remaining sequences.
If you’re using a unicode-compatible encoding (such as utf-8) and your font supports it, the smart quotes will show properly.
Additionally, the digraphs for them are 6', 6", 9', and 9". This makes it pretty easy to chain a couple of substitutes to swap them for straight variants:
%s/<C-k>6'\|<C-k>9'/'/g
Etc. Wrap it in a function or command to make it easier for later.
Sorry to bump an old thread but I stumbled upon this late at night while trying to figure out how to remove the exact same characters from a bind9 configuration file that I had pasted in from a website. The aberrant characters were "~#~X", "~#~Y", " | ", and I believe another but I can't remember it at the moment. Anyway, regular expressions couldn't seem to find and replace using the above methods, but I was able to find a solution.
If you can set VIM to show the special characters in their binary representation, then you can use regex to find that. Here's how I did it:
Steps to fix
Open the file with the problem characters in VIM
(a) original method - :set encoding=latin1|set isprint=|set display+=uhex
(b) easier method - :set encoding=utf-8
NOTE: either of these should display the digraph characters in their binary form <<<>>>
(e.g. <80>, <99>, ... )
Then search and replace with VIM regex like so
:%s:\%xNN:':g #replace NN with byte code (i.e. 80, 99, etc.)
Let's break that command down, shall we:
%s: - search command looking for all occurrences due to the % at the start and the 's' for search. The ':' (colon) has been used as the delimiter in this case, but you can use other symbols to delimit the search command.
\%x - the backslash escapes the %x which represents a byte code that we're looking for (i.e. <2 x numbers between brackets>)
NN - replace with the two chars inside of the <> that you're looking to replace in your file. In my case, the byte codes were <e2>, <80>, <99>, which I had to search for separately.
:' - then, the colon delimiting the replacement group where I'm specifying a single quote to replace the byte code, you could put whatever text you want here.
:g - finally, the last colon delineation and the letter 'g' which means to search the entire file top to bottom.
You can do more research in VIM's help with:
:help isprint
Anyway, I hope this helps someone else in the future.
References:
https://blog-en.openalfa.com/how-to-edit-non-printing-and-unicode-characters-in-vim-editor
https://unix.stackexchange.com/questions/108020/can-vim-display-ascii-characters-only-and-treat-other-bytes-as-binary-data
VIM How do I search for a <XX> single byte representation

Find and Replace with Regex in Microsoft Word 2013

I am editing an e-book document with a lot of unnecessary markup. I have a number of sections in the text with code similar to this:
<i>Some text here</i>
I am trying to run a regex find and replace that will find any phrase between the two i-tags, remove the i-tags, and apply a style to the text.
Here is what I'm using to search:
Find: (<i>)(*)(</i>)
Replace: \2
I'm also selecting Styles > i (for italic). This tells our conversion software to apply italics to the text. If I leave the i-tags, what ends up happening is ScribeNet's conversion process converts them to hex-values so that they show up as literal text in the e-book. Messy.
When I run this search, I get no results. I have "use wildcards" checked. What am I missing? According to Microsoft's help website, * is used to represent any number or type of characters, and individual strings are supposed to be enclosed in parentheses.
To search for a character that's defined as a wildcard, place a backslash (\) before that character. The * itself matches any string of characters, so use the range quantifier to match (1 or more times)
Find: \<i\>(*{1,})\</i\>
Replace: \1
Search for \<i\>(*{1,})\</i\> and replace with \1. Don't forget to check Use wildcard.
There is a reference table for Word's "regular expressions" here: http://office.microsoft.com/en-ca/word-help/find-and-replace-text-by-using-regular-expressions-advanced-HA102350661.aspx
< and > are special characters that need to be escaped
* means any character
{1,} means one or more times
There is a special tool for Microsoft Word called Multiple Find & Replace (see http://www.translatortools.net/products/transtoolsplus/word-multiplefindreplace) which allows to work around Word's wildcard limitations. This tool can use the standard regular expressions syntax to search and replace any text within a Word document. For example, to search for any HTML tags, you can just use <[^>]+> which will find opening, closing and standalone HTML tags. You can add any number of expressions to a list and then search the document for all of them, replace everything, see all matches for all the search expressions entered, replace only selected matches, and a few more things.
I created it for translators and editors, but it is great for any advanced search/replace operations in Word, and I am sure you will find it very useful.
Stanislav

How to replace . in patterned strings with / in Visual Studio

I have lot of code in our solution like this:
Localization.Current.GetString("abc.def.gih.klm");
I want to replace it with:
Localization.Current.GetString("/abc/def/gih/klm");
the number of dots (.) is variable.
How can I do this in Visual Studio (2010)?
Edit: I want to replace strings in code (in VS 2010 editor), not when I run my application
Thank you very much
Misread your request.
If you press ctrl+shift h and put this as your find string
{Localization\.Current\.GetString\("[A-Za-z\/]+}(\.)
Then put this as your replace with:
\1/
And then in find options tick use regular expressions.
This will find the first dot and replace it. Clicking find next will get the second one etc. You will have to keep doing a replace all until they are all done. Someone can probably improve that!
As shown below
Try this in the "Replace in Files" Dialogue with "Use Regular expressions"
Find what:
{[^"]*"[^"]*}\.
If you want to be a bit more strict on the allowed characters between the quotes then try this
{[^"]*"[A-Za-z.]*}\.
this would allow only ASCII characters and dots between the quotes.
Replace with
\1/
It will find the first " in a row and replace the last dot before the next " with /
The problem is, it replaces only the last occurrence of a dot within the first set of "" in each row. So you would have to call this a few times until you get the message "The text was not found"
And be careful if there is a wanted dot between "". it will be replaced also.
EDIT
you can't use this in visual studio as it has its own flavour of regex, not the one used in the .NET regex classes, and I don't think you can do lookbehind with it.
you can use this regex:
(?<=\("[\w.]+)\.
in the find and replace, replacing by .
Breaking it down:
Match a dot (the . at the end)
Which is preceeded by (positive look behind) a bracket ( followed by a " and then any number of characters which are letters or a dot (dots don't need to be escaped in a group)
if you are sure that the text that you want to replace only ever has the Localization.Current.GetString bit then you could include that in the lookbehind of the regex:
(?<=Localization\.Current\.GetString\("[\w.]+)\.

Multiline Regular Expression search and replace!

I've hit a wall. Does anybody know a good text editor that has search and replace like Notepad++ but can also do multi-line regex search and replace? Basically, I am trying to find something that can match a regex like:
search oldlog\(.*\n\s+([\r\n.]*)\);replace newlog\(\1\)
Any ideas?
Notepad++ can now handle multi line regular expressions (just update to the latest version - feature was introduced around March '12).
I needed to remove all onmouseout and onmouseover statements from an HTML document and I needed to create a non-greedy multi line match.
onmouseover=.?\s*".*?"
Make sure you check the: [ ] . matches newline checkbox if you want to use the multi line match capability.
EditPad Pro has better regex capabilities than any other editor I've ever used.
Also, I suspect you have an error in your regex — [\r\n.] will match only carriage returns, newlines, and full stops. If you're trying to match any character (i.e. "dot operator plus CR and LF), try [\s\S] instead.
My personal recommendation is IDM Computing's UltraEdit (www.ultraedit.com) - it can do regular expressions (both search and replace) with Perl, Unix and UltraEdit syntax. Multi-line matching is one of the capabilities in Perl regex mode in it.
It also has other nice search capabilities (e.g search in specific character column range, search in multiple files, search history, search favorites, etc...)
(source: ultraedit.com)
The Zeus editor can do multi-line search and replace.
I use Eclipse, which is free and that you may already have if you are a developer. '\R' acts as platform independent line delimiter. Here is an example of multi-line search:
search:
\bibitem.(\R.)?\R?{([^{])}$\R^([^\].[^}]$\R.$\R.)
and replace:
\defcitealias{$2}{$3}
I'm pretty sure Notepad++ can do that now via the TextFX plugin (which is included by default). Hit Control-R in Notepad++ and have a play.
TextPad has good Regex search and replace capabilities; I've used it for a while and am pretty happy with it.
From the Features:
Powerful search/replace engine using
UNIX-style regular expressions, with
the power of editor macros. Sets of
files in a directory tree can be
searched, and text can be replaced in
all open documents at once.
For more options than you could possibly need, check out "Notepad++ Alternatives" at AlternativeTo.net.
you can use Python Script plugin for Multiline Regular Expression search and replace!
- http://npppythonscript.sourceforge.net/docs/latest/scintilla.html?highlight=pymlreplace#Editor.pymlreplace
# This example replaces any <br/> that is followed by another on the next line (with optional spaces in between), with a single one
editor.pymlreplace(r"<br/>\s*\r\n\s*<br/>", "<br/>\r\n")
I use Notepad++ all the time but it's Regex has alway been a bit lacking.
Sublime Text is what you want.
EditPlus does a good job at search/replace using regex (including multiline)
You could use Visual Studio. Download Express for free if you don't have a copy.
VS's regex is non-standard, so you'd have to use \n:b+[\r\n] instead.
The latest version of UltraEdit has multiline find and replace w/ regex support.
Or if you're OK with using a more specialized regular expression tool for this, there's Regex Hero. It has the side benefit of being able to do everything on the fly. In other words, you don't have to click a button to test your regular expression because it's automatically tested after every keypress.
Personally, I'd use UltraEdit if I'm looking to replace text in multiple files. That way I can just select the files to replace as a batch and click Replace. But if I'm working with a single text file and I'm in need of writing a more complex regular expression then I'd paste it into Regex Hero and work with it there. That's because Regex Hero can save time when you see everything happen in real-time.
ED for windows has two versions of regex, three sorts of cut and paste (selection, lines or blocks, AND you can shift from one to the next (unlike ultra edit, which is clunky at best) with just a mouse click while you are highlighting -- no need to pull down a menu. The sheer speed of getting the job done is incredible, like reading on a Kindle, you don't have to think about it.
You can use a recent version of Notepad++ (Mine is 6.2.2).
No need to use the option ". match newline" as suggested in another answer. Instead, use the adequate regular expression with ^ for "begin of line" and $ for "end of line". Then use \r\n after the $ for a "new line" in a dos file (or just \n in a unix file as the carriage return is mainly used for dos/windows text file):
Ex.: to remove all lines starting with tags OBJE after a line starting with a tag UID (from a gedcom file - used in genealogy), I did use the following search regex:
^UID (.*)$\r\n^(OBJE (.*)$\r\n)+
And the following replace value:
UID \1\r\n
This is matching lines like this:
UID 4FBB852FB485B2A64DE276675D57A1BA
OBJE #M4#
OBJE #M3#
OBJE #M2#
OBJE #M1#
and the output of the replacement is
UID 4FBB852FB485B2A64DE276675D57A1BA
550 instances have been replaced in less than 1 sec. Notepad++ is really efficient!
Otherwise, to validate a Regular expression I like to use the .Net RegEx Tester (http://regexhero.net/tester/). It's really great to write and test on the fly a Reg Ex...
PS.: Also, you can use [\s\S] in your regex to match any character including new lines. So, if you look for any block of "multi-line" text starting with "xxx" and ending with "abc", the following Regex will be fine:^xxx[\s\S]*?abc$ where "*?" is to match as less as possible between xxx and abc !!!

Interactive search/replace regex in Vim?

I know the regex for doing a global replace,
%s/old/new/g
How do you go about doing an interactive search-replace in Vim?
Add the flag c (in the vim command prompt):
:%s/old/new/gc
will give you a yes/no prompt at each occurrence of 'old'.
Vim's built-in help offers useful info on the options available once substitution with confirmation has been selected. Use:
:h :s
Then scroll to section on confirm options. Screenshot below:
For instance, to substitute this and all remaining matches, use a.
Mark Biek pointed out using:
%s/old/new/gc
for a global search replace with confirmation for each substitution. But, I also enjoy interactively verifying that the old text will match correctly. I first do a search with a regex, then I reuse that pattern:
/old.pattern.to.match
%s//replacement/gc
The s// will use the last search pattern.
I think you're looking for c, eg s/abc/123/gc, this will cause VIM to confirm the replacements. See :help :substitute for more information.
I usually use the find/substitute/next/repeat command :-)
/old<CR>3snew<ESC>n.n.n.n.n.n.n.
That's find "old", substitute 3 characters for "new", find next, repeat substitute, and so on.
It's a pain for massive substitutions but it lets you selectively ignore some occurrences of old (by just pressing n again to find the next one instead of . to repeat a substitution).
If you just want to count the number of occurrences of 'abc' then you can do %s/abc//gn. This doesn't replace anything but just reports the number of occurrences of 'abc'.
If your replacement text needs to change for each matched occurrence (i.e. not simply choosing Yes/No to apply a singular replacement) you can use a Vim plugin I made called interactive-replace.
Neovim now has a feature inccommand which allows you to preview the substitution:
inccommand has two options:
set inccommand=split previews substitutions in a split pane
set inccommand=nosplit previews substitution in the active buffer
Image taken from: https://medium.com/#eric.burel/stop-using-open-source-5cb19baca44d
Documentation of the feature: https://neovim.io/doc/user/options.html#'inccommand'