why with look behind regex, the replacement is not working well in Notepad++? - regex

In Notepad++, for example if the search regex is
(?<latex>\$[^\$]*\$)(?=[\x{4e00}-\x{9fa5}])
and the replace is
~\g{latex}~
then the replacement is working properly.
But if the search regex contains look behind expression like
(?<=[\x{4e00}-\x{9fa5}])(?<latex>\$[^\$]*\$)(?=[\x{4e00}-\x{9fa5}])
then replace
~\g{latex}~
doesn't work in Notepad++, why???

Actually, I found out that your named backreference is the actual problem. As per the documentation, you need to use the syntax $+{name} for a named capture reference in the replace. So that one should work:
(?<=[\x{4e00}-\x{9fa5}])(?<latex>\$[^\$]*\$)(?=[\x{4e00}-\x{9fa5}])
And replace with:
~$+{latex}~
So the first regex you had should not be working properly, but replacing with literal ~g<latex>~. Even so, I can't really be sure here since I'm using an older version of N++ and the docs could be out of date.
Though I think that the simplest would be that you don't use the capture group. The below should work fine:
(?<=[\x{4e00}-\x{9fa5}])\$[^$]*\$(?=[\x{4e00}-\x{9fa5}])
And replace with ~$0~.

Related

Regex search and replace except for specific word

I'm trying to search and replace some strings in my code with RegEx as follows:
replacing self.word with self.indices['word']
So for example, replace:
self.example
with
self.indices['example']
But it should work for all words, not just 'example'.
So the RegEx needed for this probably doesn't need the actual word in it.
It should actually just replace everything but the word itself.
What I tried is the following:
self.(.*?)
It works for searching all the strings that match 'self.' but when I want to change it I lose the word like 'example' so I can't change it to self.indices['example'].
What you need to do is a substitution using capturing groups. Depending on the language used some of the syntax might be slightly different but in general you would want something like this:
self\.([^\s]+)
replace with:
self.indices['\1']
https://regex101.com/r/R3xFZB/1
We are using the parans to caputre what is after self. and in the replace putting that captured data into \1 which is the first (and in this case) only captured group. For some languages you might need $1 or \\1 for the substitution variable.
As with most regexes this can be done many different ways using look arounds, etc. but I think for somebody new this is easy to read, understand and maintain. The comment #wnull made has a more proper regex leveraging a look behind that should also do the trick :)

vs code can't replace using regex group references under PCRE2 mode

I'm trying to replace my std stl usage to EASTL and since i have a lot of cpp/h files, i'm relying in 'Search in Files' option of vs-code, with the following pattern:
((?<=#include \<)([^\/(.h)]+?)(?=\>))
This matches completely fine in regexr.com, in both match and replace and in vs code as well but needs the option of PCRE2 engine being enabled due backreferences use.
Trying to reference the matching group #1 using $1 under Search sidebar view simply doesn't work, and just adds "$1".
But if i search & replace with the same input for each file manually, it works as intended.
Thanks.
EDIT: The bug which prevented replace from working properly with lookarounds has been fixed, see capture group in regex not working. It is working in the Insider's Build and will presumably be included in v1.39.
However, your regex:
((?<=#include \<)([^\/(.h)]+?)(?=\>)) should be changed to:
((?<=#include <)([^\/(.h)]+?)(?=>)) note the removal of escapes before < and > and then it works in the Insider's Build as of this date.
[And the PCRE2 mode has been deprecated since the original question. So you do not need that option anymore, PCRE2 will be used automatically if needed.]
There is a similar bug when using search/replace with newlines and the replace just literally inserts a $1 instead of the capture group's value. This bug has been fixed in the latest Insider's Build, see multiline replace issue and issue: newlines and replace with capture groups.
But I tried your regex in the Insider's Build and it has the same result as you had before - it inserts the literal $1 instead of its value. It appears to be a similar bug but due to the regex lookarounds.
So I tried a a simpler, but I think still correct, regex without the lookarounds:
^(#include\s+<)([^\.\/]+?)(>)
and replace with $1EASTL/$2.h$3 and it works as expected.
.

Regex substitution with Notepad++

I have a text file with several lines like these ones:
cd_cod_bus
nm_number_ex
cd_goal
And I want to get rid of the - and uppercase the following character using Notepad++ (I can also use other tool but if it doesn't get the problem more troublesome).
So I tried to get the characters with the following regex (?<=_)\w and replace it using \U\1\E\2 for the uppercasing trick but here is where my problems came. I think the regex is OK but once I click replace all I get this result:
cd_od_us
nm_umber_x
cd_oal
as you can see it is only deleting the match.
Do you know where the problem is?
Thanks.
The search regex has no capture groups, i.e. the \1 and \2 references in the replacement do not refer to anything.
Try this instead:
Search: _(\w)
Replace \U\1\E
There you have a capture group in the search part (the parenthesis around the \w) and the \1 in the replacement refers back to what was captured.
replace
_(.)
with
\U$1
will give you:
cdCodBus
nmNumberEx
cdGoal
and for your
I can also use other tool but if it doesn't get the problem more troublesome
I suggest you try vim.
Try this,
_(\w)
and replace with
\U\1
here's a screenshot

capture with if-then-else in php regex

I'm very lost with a regular expression. It's just black magic to me. Here's what i need:
there is a filename: some_file.jpg
it might be in the following format: some_file_p250.jpg
the regex to match the file in simple format: /^([a-zA-Z_-0-9]+).(jpg|jpeg|png)$/
the regex to match the file in advanced format: /^([a-zA-Z_-0-9]+)(_[a-z]?[0-9]{2,3}).(jpg|jpeg|png)$/
my question is as follows: how do i make the "(_[a-z]?[0-9]{3,4})" part optional? I've tried adding a question mark to the second group like this:
/^([a-zA-Z_\-0-9]+)(_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/
Even though the pattern works, it always captures the contents of the second group in the first group and leaves the second empty.
How can i make this work to capture the filename, advanced part (_p250) and the extension separately? I'm thinking it has something to do with the greediness of the first group, but i might be completely wrong and even if i'm right, i still don't know how to solve it.
Thanks for your thoughts
Adding a question mark after the first plus will make the first capturing expression non-greedy. This worked for me using your test case:
/^([a-zA-Z_\-0-9]+?)(_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/
I tested in Javascript, not PHP, but here's my test:
"some_file_p250.jpg".match(/^([a-zA-Z_\-0-9]+?)(_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/)
and my results:
["some_file_p250.jpg", "some_file", "_p250", "jpg"]
In my experience, making a capturing expression non-greedy makes regular expressions a lot more intuitive and will often make them work the way I expect them to work. In your case, it was doing what you suspected; the first expression was capturing everything and never gave the second expression a chance to capture anything.
I think this is what you want:
/^([a-zA-Z_\-0-9]+)(|_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/
or
/^([\d\w\-]+)(|_[a-z]?[0-9]{3,4})\.(jpg|jpeg|png)$/

How to search (using regex) for a regex literal in text?

I just stumbled on a case where I had to remove quotes surrounding a specific regex pattern in a file, and the immediate conclusion I came to was to use vim's search and replace util and just escape each special character in the original and replacement patterns.
This worked (after a little tinkering), but it left me wondering if there is a better way to do these sorts of things.
The original regex (quoted): '/^\//' to be replaced with /^\//
And the search/replace pattern I used:
s/'\/\^\\\/\/'/\/\^\\\/\//g
Thanks!
You can use almost any character as the regex delimiter. This will save you from having to escape forward slashes. You can also use groups to extract the regex and avoid re-typing it. For example, try this:
:s#'\(\\^\\//\)'#\1#
I do not know if this will work for your case, because the example you listed and the regex you gave do not match up. (The regex you listed will match '/^\//', not '\^\//'. Mine will match the latter. Adjust as necessary.)
Could you avoid using regex entirely by using a nice simple string search and replace?
Please check whether this works for you - define the line number before this substitute-expression or place the cursor onto it:
:s:'\(.*\)':\1:
I used vim 7.1 for this. Of course, you can visually mark an area before (onto which this expression shall be executed (use "v" or "V" and move the cursor accordingly)).