Regex / Vim: Matching everything except a pattern, where pattern is multi-line? - regex

Is there a way in Vim to perform a similar operation to
:v/PATTERN/d
where PATTERN is a multi-line regex pattern? I'm sure there is a way to do this in script, but I am mainly curious as to if it is possible to do using only standard regex substitution or Vim commands, because at this point it is more academic than an actual need.
My example is the following:
asdf
begin
blah
end
asdf
alsdfjasf
begin
random stuff
end
...
I want to get the blocks of begin/end with the lines between them, but ignore everything outside of the blocks, ultimately ending up with
begin
blah
end
begin
random stuff
end
...
My thoughts were to do
:v/begin\_.\{-}end/d
where everything didn't match that would be deleted or even copied to register, but obviously :v and :g only work on single lines.
Then, I started going down the path of running a substitute and substitute everything with empty string that DIDN'T match the begin\_.\{-}end pattern, but I cannot grasp how to achieve such using look-behinds or anything. The regex works perfectly fine when just searching, but I can't figure out how to tell the regex engine to find everything BUT that pattern. Any ideas?

clear reg a
qaq
append begin...end to reg a
:g/begin/,/end/y A
open new tab
:tabnew
put reg a
"ap

Flip it inside out, and delete everything delimited by the pattern:
:%s/\%(^end\n*\|\%^\)\zs\_.\{-}\ze\%(^begin\|\%$\)//
\%( ... ) - non capturing group
^end\n* - the end of your pattern
\%^ - the beginning of the file
\zs - don't include anything matched before this point in the string to be replaced
\_.\{-} - non-greedy matching of anything (including newlines)
\ze - don't include anything matched after this point in the string to be replaced
^begin - the beginning of your pattern
\%$ - the end of your pattern

The commands executed by g and v can themselves take ranges, so you can act on everything from "begin" to "end" with :g /begin/ .,/end/xxx where xxx is a command to execute on that range. You cant really use this for :v but there are several ways you could do it in multiple passes, e.g.
"mark all the lines we want to keep by putting '#' at the start of the line"
:g/^begin/ .,/^end/ s/^/#/
"delete all unmarked lines"
:v/^#/ d
"remove the markers"
:s/^#//
Of course this assumes you do not have any lines in the file starting with #.
Alternatively you could delete everything between each "end" and the next "begin"
:g/^end/ +1,/^begin/-1 d
then delete the cruft left at the start and end of the file:
:1,/^begin/-1 d
:$?^end?+1,$ d
Note the use of $?^end?+1 to search backwards from the end of the file to find the last line starting with "end".
N.B. the last two will delete too much if the file starts with "begin" or ends with "end" at that point, so check before you use them.

Related

Using vim regex to delete an entire line if it contains a period AND doesn't contain certain characters

I'm using vim and I'd like to delete an entire line that has a period (.) BUT doesn't have the following characters :, ö, ä, ë
good. bad # gets deleted
göod. # does not get deleted
bäd. goëd # does not get deleted
go:od. # does not get deleted
Below is the regex statement I'm using. I'm using a substitute statement because I'd like to confirm each deletion, but I'm open to any solution (ie %g//d).
%s/\.\n//c
This is about as simple as I can get it. Just a basic check for a fullstop, and make sure that everything before and after it (on that line) isn't an alt character. Note, you may need to replace the leading ^ and trailing $ with \n or perhaps add another flag to the commandline to make this find it per-row.
^[^:öäë]*\.[^:öäë]*$
Or translated for use in VIM (kudos to Sundeep):
:g/^[^:öäë]*\.[^:öäë]*$/d
Example: https://regex101.com/r/98tOtz/4
#Addison's answer is fine for this particular case; here's a more general solution in case the positive match isn't as trivial as .:
/^\%(.*[:öäë]\)\#!.*\./
This asserts a non-match \#! from the beginning of the line ^ anywhere after of the bad characters .*[:öäë], then matches (again from the beginning, as the assertion did not consume any characters) a literal period .*\.
You can use this regular expression both in :global as well as :substitute:
:%s/^\%(.*[:öäë]\)\#!.*\..*\n//c

find a single quote at the end of a line starting with "mySqlQueryToArray"

I'm trying to use regex to find single quotes (so I can turn them all into double quotes) anywhere in a line that starts with mySqlQueryToArray (a function that makes a query to a SQL DB). I'm doing the regex in Sublime Text 3 which I'm pretty sure uses Perl Regex. I would like to have my regex match with every single quote in a line so for example I might have the line:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name'");
I want the regex to match in that line both of the quotes around $name but no other characters in that line. I've been trying to use (?<=mySqlQueryToArray.*)' but it tells me that the look behind assertion is invalid. I also tried (?<=mySqlQueryToArray)(?<=.*)' but that's also invalid. Can someone guide me to a regex that will accomplish what I need?
To find any number of single quotes in a line starting with your keyword you can use the \G anchor ("end of last match") by replacing:
(^\h*mySqlQueryToArray|(?!^)\G)([^\n\r']*)'
With \1\2<replacement>: see demo here.
Explanation
( ^\h*mySqlQueryToArray # beginning of line: check the keyword is here
| (?!^)\G ) # if not at the BOL, check we did match sth on this line
( [^\n\r']* ) ' # capture everything until the next single quote
The general idea is to match everything until the next single quote with ([^\n\r']*)' in order to replace it with \2<replacement>, but do so only if this everything is:
right after the beginning keyword (^mySqlQueryToArray), or
after the end of the last match ((?!^)\G): in that case we know we have the keyword and are on a relevant line.
\h* accounts for any started indent, as suggested by Xælias (\h being shortcut for any kind of horizontal whitespace).
https://stackoverflow.com/a/25331428/3933728 is a better answer.
I'm not good enough with RegEx nor ST to do this in one step. But I can do it in two:
1/ Search for all mySqlQueryToArray strings
Open the search panel: ⌘F or Find->Find...
Make sure you have the Regex (.* ) button selected (bottom left) and the wrap selector (all other should be off)
Search for: ^\s*mySqlQueryToArray.*$
^ beginning of line
\s* any indentation
mySqlQueryToArray your call
.* whatever is behind
$ end of line
Click on Find All
This will select every occurrence of what you want to modify.
2/ Enter the replace mode
⌥⌘F or Find->Replace...
This time, make sure that wrap, Regex AND In selection are active .
Them search for '([^']*)' and replace with "\1".
' are your single quotes
(...) si the capturing block, referenced by \1 in the replace field
[^']* is for any character that is not a single quote, repeated
Then hit Replace All
I know this is a little more complex that the other answer, but this one tackles cases where your line would contain several single-quoted string. Like this:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name' and Value='1234'");
If this is too much, I guess something like find: (?<=mySqlQueryToArray)(.*?)'([^']*)'(.*?) and replace it with \1"\2"\3 will be enough.
You can use a regex like this:
(mySqlQueryToArray.*?)'(.*?)'(.*)
Working demo
Check the substitution section.
You can use \K, see this regex:
mySqlQueryToArray[^']*\K'(.*?)'
Here is a regex demo.

Is there a better way to do a regular expression search and delete in Emacs?

In reading the emacs help and emacs wiki I haven't found an obvious way to search for a regular expression and then simply delete all the matching text it finds. I originally thought I could use the regular expression search and replace feature M-x query-replace-regexp, but could not figure out how to replace my matches with an empty string. Hitting return on an empty string would simply exit. The field highlighted by the incremental reg expr search (C-M-s) doesn't obey the same rules as a marked block of text. Otherwise I would simply cut it (C-w).
Consider the following scenario. I wanted to strip the trailing zeros from a list of numbers that have 3 or more zeros.
0.075000
0.050000
0.10000
0.075000
So this is the round about way I solved it.
F3 - begin keyboard macro
C-M-s for 000* - forward regexpr search
match the trailing zeros find the first match
C-<SPC> - mark the position at the end of the match (after the last 0)
C-M-r for [1-9] - reverse regexpr search
match the reverse non-zero digit, mark is now on the non-zero digit
C-f - move mark forward one space
C-w - cut/kill the text
F4 - end keyboard macro, run until list is processed
Surely there is a better way to do this. Any ideas?
use replace-regexp to do that. To strip trailing 0's from numbers that have 3 or more 0s:
M-x replace-regexp <RET>000+<RET><RET>
Like replace-regexp?
M-x replace-regexp
Replace regexp: $.*foo
Replace regexp $.*foo with:
You can even make your own function, eg kill-regexp
In your scratch buffer (or some other buffer) write
(defun kill-regexp (regexp)
(interactive "sRegular expression to kill: ")
(replace-regexp regexp "")
)
Make sure the cursor is somewhere on the function and then evaluate the defun:
`M-x eval-defun`
(interactive ...) means you can call it interactively. Leading s means the regexp argument is a string. Text following it is what will be displayed at the prompt (minibuffer)

Regex Match That doesn't contain some text

I am tring to create a regex that finds a Start Prefix and an End Prefix that have paragraph tags between them. But the one i have cteated is not working to my expectations.
%%%HL_START%%%(.*?)</p><p>(.*?)%%%HL_END%%%
Correctly Matches
<p>This Should %%%HL_START%%%Work</p><p>This%%%HL_END%%% SHould Match</p>
This also matches but i dont want it to match becasue the </p><p> is not in bettween the Start and End Prefix
<p>%%%HL_START%%%One%%%HL_END%%% Some More Text %%%HL_START%%%Here%%%HL_END%%%</p><p>Some more text %%%HL_START%%%Here%%%HL_END%%%</p>
I'm not entirely comfortable that regex is the right solution here; if you are getting into nested start and stop markers, you might not have a regular language...
For this specific example, try changing the regex to use [^%] instead of . so that the .*?matching can't go past the %%%%H:_END%%%%
%%%HL_START%%%([^%]*?)</p><p>([^%]*?)%%%HL_END%%%

auto indent in vim string replacement new line?

I'm using the following command to auto replace some code (adding a new code segment after an existing segment)
%s/my_pattern/\0, \r some_other_text_i_want_to_insert/
The problem is that with the \r, some_other_text_i_want_to_insert gets inserted right after the new line:
mycode(
some_random_text my_pattern
)
would become
mycode(
some_random_text my_pattern
some_other_text_i_want_to_insert <--- this line is NOT indented
)
instead of
mycode(
some_random_text my_pattern
some_other_text_i_want_to_insert <--- this line is now indented
)
i.e. the new inserted line is not indented.
Is there any option in vim or trick that I can use to indent the newly inserted line?
Thanks.
Try this:
:let #x="some_other_text_i_want_to_insert\n"
:g/my_pattern/normal "x]p
Here it is, step by step:
First, place the text you want to insert in a register...
:let #x="some_other_text_i_want_to_insert\n"
(Note the newline at the end of the string -- it's important.)
Next, use the :global command to put the text after each matching line...
:g/my_pattern/normal "x]p
The ]p normal-mode command works just like the regular p command (that is, it puts the contents of a register after the current line), but also adjusts the indentation to match.
More info:
:help ]p
:help :global
:help :normal
%s/my_pattern/\=submatch(0).", \n".matchstr(getline('.'), '^\s*').'some_other_text'/g
Note that you will have to use submatch and concatenation instead of & and \N. This answer is based on the fact that substitute command puts the cursor on the line where it does the substitution.
How about normal =``?
:%s/my_pattern/\0, \r some_other_text_i_want_to_insert/ | normal =``
<equal><backtick><backtick>: re-index position before latest jump
(Sorry about the strange formatting, escaping backtick is really hard to use here)
To keep them as separate command you could do one of these mappings:
" Equalize and move cursor to end of change - more intuitive for me"
nnoremap =. :normal! =````<CR>
" Equalize and keeps cursor at beginning of change"
nnoremap =. :keepjumps normal! =``<CR>
I read the mapping as "equalize last change" since dot already means "repeat last change".
Or skip the mapping altogether since =`` is only 3 keys with 2 of them being repeats. Easy peasy, lemon squeezy!
References
:help =
:help mark-motions
Kind of a round-about way of achieving the same thing: You could record a macro which finds the next occurance of my_pattern and inserts after it a newline and your replacement string. If auto-indent is turned on, the indent level will be maintained reagardless of where the occurance of my_pattern is found.
Something like this key sequence:
q 1 # begin recording
/my_pattern/e # find my_pattern, set cursor to end of match
a # append
\nsome_other_text... # the text to append
<esc> # exit insert mode
q # stop recording
Repeated by pressing #1
You can do it in two steps. This is similar to Bill's answer but simpler and slightly more flexible, since you can use part of the original string in the replacement.
First substitute and then indent.
:%s/my_pattern/\0, \r some_other_text_i_want_to_insert/
:%g/some_other_text_i_want_to_insert/normal ==
If you use part of the original string with \0,\1, etc. just use the common part of the replacement string for the :global (second) command.
I achieved this by using \s* at the beginning of my pattern to capture the preceding whitespace.
I'm using the vim addon for VSCode, which doesn't seem to match standard vim completely, but for me,
:%s/(\s*)(existing line)/$1$2\n$1added line/g
turns this
mycode{
existing line
}
into this
mycode{
existing line
added line
}
The parentheses in the search pattern define groups which are referenced by $1 and $2. In this case $1 is the white space captured by (\s*). I'm not an expert on different implementations of vim or regex, but as far as I can tell, this way of referencing regex groups is specific to VSCode (or at least not general). More explanation of that here. Using \s* to capture a group of whitespace should be general, though, or at least have a close analog in your environment.