Why does a regular expression with a positive lookbehind in Visual Studio cause every second match to be substituted? - regex

Given the following regular expression containing a positive lookbehind (simplified from the one I'm actually trying to use):
(?<=\s|\n)(".*?")
and the following substitution expression:
_T($1)
Visual Studio 2013 will find every matching string but when replacing, will replace the string corresponding to the subsequent match, so will replace every second string.
Furthermore, Replace All does not work and says it cannot find any matching text (even though a Find All will find the relevant strings).
Is this a bug in Visual Studio or am I doing something wrong?
Demo:

TLDR; Visual Studio (VS) search/replace using VS Regexe's have to work with Visual Studio operations and what appears to be a valid regex will not work because of all the moving parts.
Explanation Because there are actually multiple things working against that lookbehind pattern in visual studio. Each of them is working in separately to achieve what you are seeing; but they are individual actions and not a cabal of one thought failure. Let me list them via 1/2/3:
#1: When using any type of lookbehind/ahead in regular expression patterns, one must note that it doesn't capture what it specifies in the lookbehind. The capture happens on what comes after it. So your "Find Next" item doesn't capture the space or linefeed behind it. (It is what you want and this is logical) but see below how the space before it is not captured and each set is highlighted and how that interferes with the whole process.
Stand alone this works and is what is intended, as a search/highlight, but then #2 comes into play.
#2: Visual Studio editor is not a true regex replace operation. Because it is doing a two step operation to do a replace; these steps are not integrated like a code regex replace. Let that sink in.
Step one is a find, step 2 is a replace. Replace all is multiple two step (Find/Replace) operations til end of file from current location.
On this single replace skip issue, on the first press, because Replace Next has to first find the next item, it doesn't replace it; by design It just moves the highlight to the next "XXXXXX" string.
(Press 2) The user thinks Studio is going to replace what is highlighted, but that doesn't happen in this case, because the match pattern states that the current match position must have \s|\n within in it; curses, the lookbehind!
Because it doesn't have \s|\n of the lookbehind in the current selection it must move the text point which is the next location after the current highlight, and if found does a replace there.
To be clear, because the replace operation is sitting on a quote and not a \s|\n (as directed by the pattern), it must move the current pointer to the next \s|\n which it finds it and replaces the text. Note the two clicks in blue that happen to do the
#3: What is interesting is that if one doesn't do the match replacement, $1, but just some text, replace all works, uggg confusing.
Because the replace match $1 is not viable in any individual search/replace step, the replace all the operation subsequently locks up.
Summary
What you want to do is logical, but because the regex replace with a lookbehind is jiggering with the editor pointer and the two step find/replace with regex operation, a conjunction of individual scenarios is causing the whole operation to fail.
One has to design a visual studio regex pattern to work with the #1/#2/#3 editor idiosyncrasies as pointed out above. Keep in mind that VS regex is not true .NET regex parser...just a close one-off.
Is it a bug? Maybe. But IMHO a fix would require a whole redesign of search/replace feature to be more regex centric than plain text search centric (with regex patterns) like it is now.

Related

RegEx for underlining text

How can I match one line of text with a regex and follow it up with a line of dashes exactly as many as characters in the initial match to achieve text-only underlining. I intend to use this with the search and replace function (likely in the scope of a macro) inside an editor. Probably, but not necessarily, Visual Studio Code.
This is a heading
should turn into
This is a heading
-----------------
I believe I have read an example for that years ago but can't find it; neither do I seem to be able to formulate a search query to get anything useful out of Google (including variations of the question's title). If you are I'd be interested in that, too.
The best I can come up with is this:
^(.)(?=(.*\n?))|.
Substitution
$1$2-
syntax
note
^(.)
match the first character of a line, capture it in group 1
(?=(.*\n?))
then look ahead for the rest of this line and capture it in group 2, including a line break if there's any
|.
or a normal character
But the text must has a line break after it, or the underline only stays on the same line.
Not sure if it is any useful but here are the test cases.

Visual Studio hangs searching for regex ^.*$

(To clarify, I'm talking about Ctrl-Shift-F search. Current Document.)
I want to search for lines that don't contain a certain character, like '(', so I figure I need to include ^ and $ indicators to get the entire line. But this just crashes the GUI. Is there a way forward?
Lines without ( should be handled by this:
^[^\(]*$
But this hangs, as does the simpler "^.*$".
Maybe there's another way to find these lines?
EDIT: the proposed "duplicate" question is about C# RegEx class, completely utterly different from Visual Studio 2010 interactive regular expressions.
If you want to match lines, use ^[^\r\n(]*$, where \r and \n are excluded from [^(] – Wiktor Stribiżew Jun 27 at 19:36
This comment is indeed the answer. Apparently old Visual Studio searches could span multiple lines, and ^.*$ thus means start at the beginning of the line, go forward any number of characters up to the end of the document, then stop at the end of a line. These characters can include any number of newlines.
VS apparently didn't efficiently implement this query and hangs interminably.
W.S.'s proposal explicitly excludes newlines from the search parameters, thereby forcing only a single line.
Visual Studio 2012 changes to more conventional regular expression search which limits results to within single lines, so ^.*$ will just fetch all the lines of a document one by one.

visual Studio 2010 regular expressions for 'Find In Files'

I have look at the many stackoverflow posts concerning VS regular expressions and read the Microsoft page concerning regular expressions but still cannot determine where I am going wrong.
Microsoft VS regex
I want to find all lines which include the word, attribute, but which are not comment lines (do not contain the // symbol).
I have tried using the regular expression
~(^ *//).*attribute.*
meaning:
~(^ *//) --> exclude lines which begin with '//' preceded by zero or more spaces
.* --> match any character zero or more times
attributes --> match the word attributes
.* --> match any character that comes after the word attribute
I have tried several other regular expressions with about the same amount of failure. I am wondering if anyone can spot something obvious that I am not doing.
I also gave the below a try:
~( *//).*attribute.* (thinking maybe the carat was being taken as a literal instead of special)
~(//).*attribute.* (thinking maybe the * was being taken as a literal instead of special)
~(//)attribute (imminent failure but will try anything)
\s*~(//).*attributes.*
I saw quite a few posts suggesting to use the find command in batch. This can be done, but I would prefer to have the ability to double click on the findings so that the file will be opened and already scrolled to the correct location.
How about this one.
^(?=.*attribute.*\n)(?!.*//).*

regexp in vim visual selection, up to the last character selected

Given this line in vim:
The cat is sleepy.
I want to replace "sleepy" by "black" (of course this is for sake of simplicity, i'm actually editing C code)
When cursor is at the beginning of the line, i type "/sle" to make it go to the beginning of "sleepy"
Then v, then e
The word "sleepy" is now fully selected.
Then i type ":s/\%V.*\%V/black/"
And instead of having:
The cat is black.
It gives:
The cat is blacky.
The last character visually selected is not considered as part of the selection, or maybe i'm not using \%V correctly.
How can i make pattern matching works in exactly my visual selection ?
You have to add one character after \%V:
/\%V.*\%V.
. If selection contains more then one line you will have to use \%V\_.*\%V\_..
There is also another trick:
/\%V\_.\{-}\%V\#!
(multiline version). This makes use of non-greedy matches and negative look-ahead: first (\{-}) to make sure it operates only on selection, second (\#!, specifically \%V\#!) to make sure there is no visual selection after the last character.
If you have more complex pattern and cannot make it non-greedy or put one character after last \%V there is another solution, which is way to slow though:
/\%V\_.*\%(\%V\_.\)\#<=
(multiline version again). It uses positive look-behind to make sure that last character is still inside visual selection.

Replacing char in a String with Regular Expression

I got a string like this:
PREFIX-('STRING WITH SPACES TO REPLACE')
and i need this:
PREFIX-('STRING_WITH_SPACES_TO_REPLACE')
I'm using Notepad++ for the Regex Search and Replace, but i'm shure every other Editor capable of regex replacements can do it to.
I'm using:
PREFIX-\('(.*)(\s)(.*)'\)
for search and
PREFIX-('\1_\3')
for replace
but that replaces only one space from the string.
The regex search feature in Notepad++ is very, very weak. The only way I can see to do this in NPP is to manually select the part of the text you want to work on, then do a standard find/replace with the In selection box checked.
Alternatively, you can run the document through an external script, or you can get a better editor. EditPad Pro has the best regex support I've ever seen in an editor. It's not free, but it's worth paying for. In EPP all I had to do was this:
search: ((?:PREFIX-\('|\G)[^\s']+)\s+
replace: $1_
EDIT: \G matches the position where the previous match ended, or the beginning of the input if there was no previous match. In other words, the first time you apply the regex, \G acts like \A. You can prevent that by adding a negative lookahead, like so:
((?:PREFIX-\('|(?!\A)\G)[^\s']+)\s+
If you want to prevent a match at the very beginning of the text no matter what it starts with, you can move the lookahead outside the group:
(?!\A)((?:PREFIX-\('|\G)[^\s']+)\s+
And, just in case you were wondering, a lookbehind will work just as well as a lookahead:
((?:PREFIX-\('|(?<!\A)\G)[^\s']+)\s+
You have to keep matching from the beggining of the string untill you can match no more.
find /(PREFIX-\('[^\s']*)\s([^']*'\))/
replace $1_$2
like: while (/(PREFIX-\('[^\s']*)\s([^']*'\))/$1_$2/) {}
How about using Replace all for about 20 times? Or until you're sure no string contains more spaces
Due to nature of regex, it's not possible to do this in one step by normal regular expression.
But if I be in your place, I do such replaces in several steps:
find such patterns and mark them with special character
(Like replacing STRING WITH SPACES TO REPLACE with #STRING WITH SPACES TO REPLACE#
Replace #([^#\s]*)\s to #\1_ server times.
Remove markers!
I studied a little the regex tool in Notepad++ because I didn't know their possibilities.
I conclude that they aren't powerful enough to do what you want.
Your are obliged to learn and use a programming language having a real regex capability. There are a number of them. Personnaly, I use Python. It would take 1 mn to do what you want with it
You'd have to run the replace several times for each space but this regex will work
/(?<=PREFIX-\(')([^\s]+)\s+/g
Replace with
\1_ or $1_
See it working at http://refiddle.com/10z