How to delete regex match text in emacs? - regex

How can I delete some text that match with a regex in emacs?
I suppose that using:
'(query-replace-regexp PATTERN EMPTY)
and:
'(replace-regexp PATTERN EMPTY)
but they throw:
perform-replace: Invalid regexp: "Premature end of regular expression".

In general, you can delete text that matches a given regexp by using the empty string "" as the replacement in the two functions you mention. However, as others mentioned in the comments above, your regular expression is faulty.
For instance, if your buffer contains the following text:
1. My todo list
1.1. Brush teeth
1.2. Floss
2. My favorite movies
2.1. Star Wars episodes 4-6
and you would like to get rid of the numbers at the beginning of each line, you could place the cursor at the beginning of the buffer and then type M-C-% (that is, you press at a time: ALT, CTRL, Shift, 5) to invoke the command query-replace-regexp. You'll get asked two parameters in the minibuffer, first the regexp to match than the replacement string.
So, in our example, you could use the following regexp:
\([0-9]\.\)+\s-
as the first parameter, and simply hit ENTER for the second parameter, i.e., don't specify anything as the replacement. That way, the replacement is the empty string: you replace what ever matches the regexp with nothing.
query-replace-regexp will ask you interactively for every match if you want to replace it or if you want to skip it. This is the "query"-part in query-replace-regexp and it is helpful to see if the regexp you came up with actually matches what you thought it does. If you're sure it does, you can type ! to make Emacs replace the remaining matches without asking every time.
If you use M-x replace-regexp instead of M-C-% Emacs will replace every match without asking for input at every match.
For the special case that you'd like to delete whole lines when a certain part of the line matches a regexp, there's also delete-matching-lines and its evil, goatee-wearing twin brother from a parallel universe delete-non-matching-lines.

Related

Regular expression to delete all words between two specific words

I'm normally ok with regex but I'm struggling with this.
I have a simple file with two words that start and end a set of data. The data between the words changes but - start and status are always in the same place.
Example :
start
Everything in between
status
I'm trying to work out how to delete (replace) everything between and including start and status
I'm sure I had it working with this at one time
(?i)^start.+?status
set(#replaceAll,$replace regular expression(#textTest,"(?i)^start.+?status"," "),"Global")
but its just not working anymore.
You could use the regular expression
\bstart\b.+?\bstatus\b
which does not require "status" to be on the same line as "start". Two flags should be set:
case indifference (/i)
single-line mode, which allows . to match a newline (/s)
Demo
The regex reads, "match 'start' with a word break fore and aft (to avoid matching 'starting' or 'jumpstart', for example), then match one or more characters lazily, then match 'status' with wordbreaks". The middle match must be lazy so that the regex engine will stop at the next (rather than last) instance of 'status'.
If the regex engine being used does not support single-line mode, or something comparable, one can replace .+ with [\s\S]+.
So my original expression works and so dose Cary's
The files have changed since I last used the expression. They contain some white-space in the form of newlines that needed to be removed first
set(#cleanup,$replace(#text2,$new line," "),"Global")
set(#text2,$replace regular expression(#cleanup,"\\bstart\\b.*?\\bstatus\\b",""),"Global")
set(#cleanup,$replace regular expression(#cleanup,"(?i)^start.+?status:",""),"Global")
Sorry about that but thanks to all who looked and helped :)

Making query-replace-regexp more responsive

C-h h!
It happens quite often to me that I try to C-M-% a text. Thus I use query-replace-regexp interactively. So I enter the search regex and Emacs asks meekly for the replacement text, when in fact my search regex does not match any text!
Ideally, I would like to be signaled as soon as possible that my regex does not match.
Is there a way out of this?
One way to do this is to start with C-M-s, for isearch-forward-regexp, and interactively enter the regexp. That way, you'll see that it reaches the first match, and any further matches will be highlighted. Then, still in isearch mode, type C-M-%. The regexp from the isearch will automatically become the search regexp for the replacement command.

Vim S&R to remove number from end of InstallShield file

I've got a practical application for a vim regex where I'd like to remove numbers from the end of file location links. For example, if the developer is sloppy and just adds files and doesn't reuse file locations, you'll end up with something awful like this:
PATH_TO_MY_FILES&gt
PATH_TO_MY_FILES1&gt
...
PATH_TO_MY_FILES22&gt
PATH_TO_MY_FILES_ELSEWHERE&gt
PATH_TO_MY_FILES_ELSEWHERE1&gt
...
So all I want to do is to S&R and replace PATH_TO_MY_FILES*\d+ with PATH_TO_MY_FILES* using regex. Obviously I am not doing it quite right, so I was hoping someone here could not spoon feed the answer necessarily, but throw a regex buzzword my way to get me on track.
Here's what I have tried:
:%s\(PATH_TO_MY_FILES\w*\)\(\d+\)&gt:gc
But this doesn't work, i.e. if I just do a vim search on that, it doesn't find anything. However, if I use this:
:%s\(PATH_TO_MY_FILES\w*\)\(\d\)&gt:gc
It will match the string, but the grouping is off, as expected. For example, the string PATH_TO_MY_FILES22 will be grouped as (PATH_TO_MY_FILES2)(2), presumably because the \d only matches the 2, and the \w match includes the first 2.
Question 1: Why doesn't \d+ work?
If I go ahead and use the second string (which is wrong), Vim appears to find a match (even though the grouping is wrong), but then does the replacement incorrectly.
For example, given that we know the \d will only match the last number in the string, I would expect PATH_TO_MY_FILES22&gt to get replaced with PATH_TO_MY_FILES2&gt. However, instead it replaces it with this:
PATH_TO_MY_FILES2PATH_TO_MY_FILES22&gtgt
So basically, it looks like it finds PATH_TO_MY_FILES22&gt, but then replaces only the & with group 1, which is PATH_TO_MY_FILES2.
I tried another regex at Regexr.com to see how it would interpret my grouping, and it looked correct, but maybe a hack around my lack of regex understanding:
(PATH_TO_\D*)(\d*)&gt
This correctly broke my target string into the PATH part and the entire number, so I was happy. But then when I used this in Vim, it found the match, but still replaced only the &.
Question 2: Why is Vim only replacing the &?
Answer 1:
You need to escape the + or it will be taken literally. For example \d\+ works correctly.
Answer 2:
An unescaped & in the replacement portion of a substitution means "the entire matched text". You need to escape it if you want a literal ampersand.

Regex for finding substrings using Grep Console in Eclipse

I am using Grep Console in Eclipse to highlight lines in the console output that contain characters, e.g. cancel, based on a regex. The characters may have a symbol preceding and/or following it, may be surrounded by spaces, or may be substrings. In other words, I want to match the following lines (regardless of case):
The flight was cancelled.
[Cancelled] Flight 101
Are they going to cancel it?
What is the regex that I need to use to highlight these lines?
As acdcjunior already explained, you basically just need a case insensitive regular expression to match "cancel".
If you already have your output in the console, the easiest way to create this expression is to just select the word "cancel" in the output, then right click and select "Add Expression" from the context menu. A submenu will you select a group to which the new expression will be added, or create a new one. The expression item will then be created, using the following expression:
(\Qcancel\E)
Be sure to uncheck the "Case sensitive" checkbox, which is enabled by default for performance reasons and would prevent the expression from matching your second line with the capital 'C'.
This is basically the same expression acdcjunior provided, with a few differences:
The .* matchers at the beginning and end of the expression are not included, as they are not necessary. Expressions will always match substrings anywhere in a line unless the $ or ^ matchers are used to specifically refer to the beginning or end of a line.
The expression is also wrapped in parentheses to create a capture group, allowing you to assign a style not only to the entire line containing the string cancel, but also to that string itself. You can leave out the parentheses if you don't want to style that string.
\Q and \E are always included when creating an expression from a selected text string to make sure that no characters from the selected string are interpreted as special expression characters. In this case, this not necessary, as cancel only contains word characters.
This means that in your case, the simplest sufficient expression is just:
cancel
This expression also works if you use it as a "quick expression", as suggested by acdcjunior, though there is no real need for this. The idea behind quick expressions is that very long lines in the console can considerably slow down pattern matching. Grep Console therefore has a configurable limit to how many characters in each line will be matched with the configured expressions. Any characters after this limit in long lines are ignored, which means that lines which contain keywords only after the limit will not be recognised and therefore not styled.
If you configure a quick expression, every line is first matched with this expression, and only if the match is positive will the "normal" expression be used. In this case, the expressions are matched against the entire line. The quick expression should therefore be as simple as possible, so as not to slow down the matching too much.
In your case, using cancel as a quick expression and leaving the normal expression blank works because first the quick expression is positively matched against your line, and then the blank expression matches as well. If you have very long lines, it may cost you some performance though, as the quick expression will ignore the length limits explained above. Also, quick expression don't use capture groups, so you can't highlight the cancel string with a separate style in this case.
Use:
.*(\Qcancel\E).*
And do not check "Case sensitive".
Or just cancel in the "Quick expression" text box.

Explain this Regular Expression please

Regular Expressions are a complete void for me.
I'm dealing with one right now in TextMate that does what I want it to do...but I don't know WHY it does what I want it to do.
/[[:alpha:]]+|( )/(?1::$0)/g
This is used in a TextMate snippet and what it does is takes a Label and outputs it as an id name. So if I type "First Name" in the first spot, this outputs "FirstName".
Previously it looked like this:
/[[:alpha:]]+|( )/(?1:_:/L$0)/g (it might have been \L instead)
This would turn "First Name" into "first_name".
So I get that the underscore adds an underscore for a space, and that the /L lowercases everything...but I can't figure out what the rest of it does or why.
Someone care to explain it piece by piece?
EDIT
Here is the actual snippet in question:
<column header="$1"><xmod:field name="${2:${1/[[:alpha:]]+|( )/(?1::$0)/g}}"/></column>
This regular expression (regex) format is basically:
/matchthis/replacewiththis/settings
The "g" setting at the end means do a global replace, rather than just restricting the regex to a particular line or selection.
Breaking it down further...
[[:alpha:]]+|( )
That matches an alpha numeric character (held in parameter $0), or optionally a space (held in matching parameter $1).
(?1::$0)
As Roger says, the ? indicates this part is a conditional. If a match was found in parameter $1 then it is replaced with the stuff between the colons :: - in this case nothing. If nothing is in $1 then the match is replaced with the contents of $0, i.e. any alphanumeric character that is not a space is output unchanged.
This explains why the spaces are removed in the first example, and the spaces get replaced with underscores in your second example.
In the second expression the \L is used to lowercase the text.
The extra question in the comment was how to run this expression outside of TextMate. Using vi as an example, I would break it into multiple steps:
:0,$s/ //g
:0,$s/\u/\L\0/g
The first part of the above commands tells vi to run a substitution starting on line 0 and ending at the end of the file (that's what $ means).
The rest of the expression uses the same sorts of rules as explained above, although some of the notation in vi is a bit custom - see this reference webpage.
I find RegexBuddy a good tool for me in dealing with regexs. I pasted your 1st regex in to Buddy and I got the explanation shown in the bottom frame:
I use it for helping to understand existing regexs, building my own, testing regexs against strings, etc. I've become better # regexs because of it. FYI I'm running under Wine on Ubuntu.
it's searching for any alpha character that appears at least once in a row [[:alpha:]]+ or space ( ).
/[[:alpha:]]+|( )/(?1::$0)/g
The (?1 is a conditional and used to strip the match if group 1 (a single space) was matched, or replace the match with $0 if group 1 wasn't matched. As $0 is the entire match, it gets replaced with itself in that case. This regex is the same as:
/ //g
I.e. remove all spaces.
/[[:alpha:]]+|( )/(?1:_:/\L$0)/g
This regex is still using the same condition, except now if group 1 was matched, it's replaced with an underscore, and otherwise the full match ($0) is used, modified by \L. \L changes the case of all text that comes after it, so \LABC would result in abc; think of it as a special control code.