regexp in vim visual selection, up to the last character selected - regex

Given this line in vim:
The cat is sleepy.
I want to replace "sleepy" by "black" (of course this is for sake of simplicity, i'm actually editing C code)
When cursor is at the beginning of the line, i type "/sle" to make it go to the beginning of "sleepy"
Then v, then e
The word "sleepy" is now fully selected.
Then i type ":s/\%V.*\%V/black/"
And instead of having:
The cat is black.
It gives:
The cat is blacky.
The last character visually selected is not considered as part of the selection, or maybe i'm not using \%V correctly.
How can i make pattern matching works in exactly my visual selection ?

You have to add one character after \%V:
/\%V.*\%V.
. If selection contains more then one line you will have to use \%V\_.*\%V\_..
There is also another trick:
/\%V\_.\{-}\%V\#!
(multiline version). This makes use of non-greedy matches and negative look-ahead: first (\{-}) to make sure it operates only on selection, second (\#!, specifically \%V\#!) to make sure there is no visual selection after the last character.
If you have more complex pattern and cannot make it non-greedy or put one character after last \%V there is another solution, which is way to slow though:
/\%V\_.*\%(\%V\_.\)\#<=
(multiline version again). It uses positive look-behind to make sure that last character is still inside visual selection.

Related

How do I regex search in x and y for a, and only include the replacement of y if a was found in x?

I need to search through a larger text file.
This is an example of what I'm searching through.
https://pastebin.com/JFVy2TEt
recipes.addShaped("basemetals:adamantine_arrow", <basemetals:adamantine_arrow> * 4, [[<ore:nuggetAdamantine>], [<basemetals:adamantine_rod>], [<minecraft:feather>]]);
I need to look for lines that match a specific part in the first argument.
For example the "_arrow" part in the above line.
And erase everything that doesn't match on the "_arrow" in the first argument.
And the arguments differ across all of them.
And also with different names in the place where "basemetals:adamantine" is in the above line.
And since the further arguments are all different I can't wrap my head around on how to include the end only when the first thing matches.
Edit: The end goal being to ease sort my 3k+ line text file.
basic, blacksmith, carpenter, chef, chemist, engineer, farmer, jeweler, mage, mason, scribe, tailor
I think what you're trying to do is filter your text file by removing lines that don't fit a set criteria. I've chosen the Atom text editor for this solution (because I'm running Windows OS and can't install gedit, and I want to ensure you have a working example).
To remove only lines that don't have a first argument ending in _arrow, one could do (?!recipes\.addShaped\("[^"]+_arrow")recipes.+\r?\n? and replace with nothing.
As a note: this task is made more difficult by Atom's low regex support. In a more well-supported environment, my answer would probably be ^recipes\.addShaped("[^"]+(?<!_arrow)").+\r?\n? (with multiline mode).
Also, please read "What should I do when someone answers my question?".
Regex explained:
(?! ) is a negative lookahead, which peeks at the succeeding text to ensure it doesn't contain "_arrow" at end of the first argument.
\. is an escaped literal period
[^"] is a character class that signifies a character that is not a ".
+ is a quantifier which tells the regex to match the preceding character or subexpression as many times as possible, with a minimum of one time.
. is a wildcard, representing any character
\r?\n? is used to match any kind of newline, with the ? quantifier making each character optional.
Everything else it literal characters; it represents exactly what it matches.

Regex taking too many characters

I need some help with building up my regex.
What I am trying to do is match a specific part of text with unpredictable parts in between the fixed words. An example is the sentence one gets when replying to an email:
On date at time person name has written:
The cursive parts are variable, might contains spaces or a new line might start from this point.
To get this, I built up my regex as such: On[\s\S]+?at[\s\S]+?person[\s\S]+?has written:
Basically, the [\s\S]+? is supposed to fill in any letter, number, space or break/new line as I am unable to predict what could be between the fixed words tha I am sure will always be there.
Now comes the hard part, when I would add the word "On" somewhere in the text above the sentence that I want to match, the regex now matches a much bigger text than I want. This is due to the use of [\s\S]+.
How am I able to make my regex match as less characters as possible? Using "?" before the "+" to make it lazy does not help.
Example is here with words "From - This - Point - Everything:". Cases are ignored.
Correct: https://regexr.com/3jdek.
Wrong because of added "From": https://regexr.com/3jdfc
The regex is to be used in VB.NET
A more real life, with html tags, can be found here. Here, I avoided using [\s\S]+? or (.+)?(\r)?(\n)?(.+?)
Correct: https://regexr.com/3jdd1
Wrong: https://regexr.com/3jdfu after adding certain parts of the regex in the text above. Although, in html, barely possible to occur as the user would never write the matching tag himself, I do want to make sure my regex is correctjust in case
These things are certain: I know with what the part of text starts, no matter where in respect to the entire text, I know with what the part of text ends, and there are specific fixed words that might make the regex more reliable, but they can be ommitted. Any text below the searched part is also allowed to be matched, but no text above may be matched at all
Another example where it goes wrong: https://regexr.com/3jdli. Basically, I have less to go with in this text, so the regex has less tokens to work with. Adding just the first < already makes the regex take too much.
From my own experience, most problems are avoided when making sure I do not use any [\s\S]+? before I did a (\r)?(\n)? first
[\s\S] matches all character because of union of two complementary sets, it is like . with special option /s (dot matches newlines). and regex are greedy by default so the largest match will be returned.
Following correct link, the token just after the shortest match must be geschreven, so another way to write without using lazy expansion, which is more flexible is to prepend the repeated chracter set by a negative lookahead inside loop,
so
<blockquote type="cite" [^>]+?>[^O]+?Op[^h]+?heeft(.+?(?=geschreven))geschreven:
becomes
<blockquote type="cite" [^>]+?>[^O]+?Op[^h]+?heeft((?:(?!geschreven).)+)geschreven:
(?: ) is for non capturing the group which just encapsulates the negative lookahead and the . (which can be replaced by [\s\S])
(?! ) inside is the negative lookahead which ensures current position before next character is not the beginning of end token.
Following comments it can be explicitly mentioned what should not appear in repeating sequence :
From(?:(?!this)[\s\S])+this(?:(?!point)[\s\S])+point(?:(?!everything)[\s\S])+everything:
or
From(?:(?!From|this)[\s\S])+this(?:(?!point)[\s\S])+point(?:(?!everything)[\s\S])+everything:
or
From(?:(?!From|this)[\s\S])+this(?:(?!this|point)[\s\S])+point(?:(?!everything)[\s\S])+everything:
to understand what the technic (?:(?!tokens)[\s\S])+ does.
in the first this can't appear between From and this
in the second From or this can't appear between From and this
in the third this or point can't appear between this and point
etc.

Why does a regular expression with a positive lookbehind in Visual Studio cause every second match to be substituted?

Given the following regular expression containing a positive lookbehind (simplified from the one I'm actually trying to use):
(?<=\s|\n)(".*?")
and the following substitution expression:
_T($1)
Visual Studio 2013 will find every matching string but when replacing, will replace the string corresponding to the subsequent match, so will replace every second string.
Furthermore, Replace All does not work and says it cannot find any matching text (even though a Find All will find the relevant strings).
Is this a bug in Visual Studio or am I doing something wrong?
Demo:
TLDR; Visual Studio (VS) search/replace using VS Regexe's have to work with Visual Studio operations and what appears to be a valid regex will not work because of all the moving parts.
Explanation Because there are actually multiple things working against that lookbehind pattern in visual studio. Each of them is working in separately to achieve what you are seeing; but they are individual actions and not a cabal of one thought failure. Let me list them via 1/2/3:
#1: When using any type of lookbehind/ahead in regular expression patterns, one must note that it doesn't capture what it specifies in the lookbehind. The capture happens on what comes after it. So your "Find Next" item doesn't capture the space or linefeed behind it. (It is what you want and this is logical) but see below how the space before it is not captured and each set is highlighted and how that interferes with the whole process.
Stand alone this works and is what is intended, as a search/highlight, but then #2 comes into play.
#2: Visual Studio editor is not a true regex replace operation. Because it is doing a two step operation to do a replace; these steps are not integrated like a code regex replace. Let that sink in.
Step one is a find, step 2 is a replace. Replace all is multiple two step (Find/Replace) operations til end of file from current location.
On this single replace skip issue, on the first press, because Replace Next has to first find the next item, it doesn't replace it; by design It just moves the highlight to the next "XXXXXX" string.
(Press 2) The user thinks Studio is going to replace what is highlighted, but that doesn't happen in this case, because the match pattern states that the current match position must have \s|\n within in it; curses, the lookbehind!
Because it doesn't have \s|\n of the lookbehind in the current selection it must move the text point which is the next location after the current highlight, and if found does a replace there.
To be clear, because the replace operation is sitting on a quote and not a \s|\n (as directed by the pattern), it must move the current pointer to the next \s|\n which it finds it and replaces the text. Note the two clicks in blue that happen to do the
#3: What is interesting is that if one doesn't do the match replacement, $1, but just some text, replace all works, uggg confusing.
Because the replace match $1 is not viable in any individual search/replace step, the replace all the operation subsequently locks up.
Summary
What you want to do is logical, but because the regex replace with a lookbehind is jiggering with the editor pointer and the two step find/replace with regex operation, a conjunction of individual scenarios is causing the whole operation to fail.
One has to design a visual studio regex pattern to work with the #1/#2/#3 editor idiosyncrasies as pointed out above. Keep in mind that VS regex is not true .NET regex parser...just a close one-off.
Is it a bug? Maybe. But IMHO a fix would require a whole redesign of search/replace feature to be more regex centric than plain text search centric (with regex patterns) like it is now.

Remove everything before and after variable=int

I'm terrible at regex and need to remove everything from a large portion of text except for a certain variable declaration that occurs numerous times, id like to remove everything except for instances of mc_gross=anyint.
Generally we'd need to use "negative lookarounds" to find everything but a specified string. But these are fairly inefficient (although that's probably of little concern to you in this instance), and lookaround is not supported by all regex engines (not sure about notepad++, and even then probably depends on the version you're using).
If you're interested in learning about that approach, refer to How to negate specific word in regex?
But regardless, since you are using notepad++, I'd recommend selecting your target, then inverting the selection.
This will select each instance, allowing for optional white space either side of the '=' sign.
mc_gross\s*=\s*\d+
The following answer over on super user explains how to use bookmarks in notepad++ to achieve the "inverse selection":
https://superuser.com/questions/290247/how-to-delete-all-line-except-lines-containing-a-word-i-need
Substitute the regex they're using over there, with the one above.
You could do a regular expression replace of ^.*\b(mc_gross\s*=\s*\d+)\b.*$ with \1. That will remove everything other than the wanted text on each line. Note that on lines where the wanted text occurs two or more times, only one occurrence will be retained. In the search the ^.*\b matches from start-of-line to a word boundary before the wanted text; the \b.*$ matches everything from a word boundary after the wanted text until end of line; the round brackets capture the wanted text for the replacement text. If text such as abcmc_gross=13def should be matched and retained as mc_gross=13 then delete the \bs from the search.
To remove unwanted lines do a regular expression search for ^mc_gross\s*=\s*\d+$ from the Mark tab, tick Bookmark line and click Mark all. Then use Menu => Search => Bookmark => Remove unmarked lines.
Find what: [\s\S]*?(mc_gross=\d+|\Z)
Replace with: \1
Position the cursor at the start of the text then Replace All.
Add word boundaries \b around mc_gross=\d+ if you think it's necessary.

How to delete regex match text in emacs?

How can I delete some text that match with a regex in emacs?
I suppose that using:
'(query-replace-regexp PATTERN EMPTY)
and:
'(replace-regexp PATTERN EMPTY)
but they throw:
perform-replace: Invalid regexp: "Premature end of regular expression".
In general, you can delete text that matches a given regexp by using the empty string "" as the replacement in the two functions you mention. However, as others mentioned in the comments above, your regular expression is faulty.
For instance, if your buffer contains the following text:
1. My todo list
1.1. Brush teeth
1.2. Floss
2. My favorite movies
2.1. Star Wars episodes 4-6
and you would like to get rid of the numbers at the beginning of each line, you could place the cursor at the beginning of the buffer and then type M-C-% (that is, you press at a time: ALT, CTRL, Shift, 5) to invoke the command query-replace-regexp. You'll get asked two parameters in the minibuffer, first the regexp to match than the replacement string.
So, in our example, you could use the following regexp:
\([0-9]\.\)+\s-
as the first parameter, and simply hit ENTER for the second parameter, i.e., don't specify anything as the replacement. That way, the replacement is the empty string: you replace what ever matches the regexp with nothing.
query-replace-regexp will ask you interactively for every match if you want to replace it or if you want to skip it. This is the "query"-part in query-replace-regexp and it is helpful to see if the regexp you came up with actually matches what you thought it does. If you're sure it does, you can type ! to make Emacs replace the remaining matches without asking every time.
If you use M-x replace-regexp instead of M-C-% Emacs will replace every match without asking for input at every match.
For the special case that you'd like to delete whole lines when a certain part of the line matches a regexp, there's also delete-matching-lines and its evil, goatee-wearing twin brother from a parallel universe delete-non-matching-lines.