How to change this regex without lookbehind check - regex

It should match substring between 0 or more spaces. C++11 does not have look behind. This is possible to rewrite this regex ? Or do I need to install boost and use "full" regex powerful?
The regex: ^\s*(.*(?<! ))\s*$
The image:
UPDATE: match in backreference!

You can make the inner * lazy by using .*? instead, which makes it match as few characters as possible while still giving you a match. This allows the last \s* to consume all the spaces:
>>> re.match(r'^\s*(.*?)\s*$', ' asdf asdf ').group(1)
'asdf asdf'

Related

Append End of Line with Substring from Current Line [duplicate]

This question already has an answer here:
Replace with whole match value using Notepad++ regex search and replace
(1 answer)
Closed 9 months ago.
I've scoured Stack Overflow for something just like this and can't seem to come up with a solution. I've got some text that looks like this:
command.Parameters.Add("#Id
command.Parameters.Add("#IsDeleted
command.Parameters.Add("#MasterRecordId
command.Parameters.Add("#Name
...
And I would like the text to end up like this:
command.Parameters.Add("#Id", acct.Id);
command.Parameters.Add("#IsDeleted", acct.IsDeleted);
command.Parameters.Add("#MasterRecordId", acct.MasterRecordId);
command.Parameters.Add("#Name", acct.Name);
...
As you can see, I essentially want to append the end of the line with: ", acct.<word between # and second ">);
I'm trying this:
Find What: (?<=#).+?(?=\r) - This works, it finds the appropriate word.
Replace: \1", acct.\1); - This doesn't. It changes the line to (for Id):
command.Parameters.Add("#", acct.
Not sure what I'm doing wrong. I thought that \1 is supposed to be the "capture" from the "Find what" box, but it's not I guess?
The \1 backreference will only work if you have a capturing group in your pattern:
(?<=#)(.+?)(?=\r)
If you're not using a capturing group, you should use $& instead of \1 as a backreference for the entire match. Additionally, parentheses in the replacement string need to be escaped. So, the replacement string should be:
$&", acct.$&\);
You might also want to use $ instead of the Lookahead (?=\r) in case the last line isn't followed by an EOL character.
Having said all that, I personally prefer to be more explicit/strict when doing regex substitution to avoid messing up other lines (i.e., false positives). So I would go with something like this:
Find: (\bcommand\.Parameters\.Add\("#)(\w+)$
Replace: \1\2", acct.\2\);
Note that \w will only match word characters, which is likely the desired behavior here. Feel free to replace it with a character class if you think your identifiers might have other characters.
You could also omit the lookbehind, and match the # and then use \K to clear the current match buffer.
Then you can match the rest of the line using .+
Note that you don't have to make the quantifier non greedy .*? as you are matching the rest of the line.
In the replacement, use the full match using $0
See a regex demo for the matches:
Find what:
#\K.+
Replace with:
$0", acct.$0\)
If there must be a newline to the right, you might also write the pattern as one of:
#\K.+(?=\r)
#\K.+(?=\R)

How can I use regex to convert Uppercase text to lowercase text in combination with a look-ahead and look-behind

In the context of an XML file, I want to use the XML tags in a positive look-behind and positive look-ahead to convert a value to lowercase.
BEFORE:
<CONDITION NAME="ABC-DEF-GHI" DATE="DATE">
AFTER:
<CONDITION NAME="abc-def-ghi" DATE="DATE">
Pattern's tried from other questions/regex wiki that don't work.
1.
FIND:
(?<=(<CONDITION NAME="))(.+)(?=(" DATE="DATE"))
REPLACE:
\L0
FIND:
(?<=(<CONDITION NAME=".*))(\w)(?=(.*" DATE="DATE"))
REPLACE:
\L$1
Using VS Code 1.62.1
MAC OS Darwin x64 19.6.0
You don't need any capture groups if yo want to use lookarounds at the left and right side.
Instead of using .+ which is a broad match and can match too much, you can use a negated character class [^"]+ to match any character except a double quote, or you can use [\w-]+ to match 1 or more word characters or a hyphen:
(?<=<CONDITION NAME=")[^"]+(?=" DATE="DATE")
Regex demo
Replace with the full match using $0
\L$0
Another option is to use 2 capture groups with a single lookahead as lookarounds can be expensive, and replace with $1\L$2
(<CONDITION NAME=")([\w-]+)(?=" DATE="DATE")
Pattern 2 works. The replace value just needs to change from
\L$1 -> \L$2
Pattern 1 could also be used with \L$2 as the replace value.
This pattern works:
FIND:
(?<=(<CONDITION NAME=".*))(\w)(?=(.*" DATE="DATE"))
REPLACE:
\L$2
Make sure you make the other groups non-capturing:
(?<=(?:<CONDITION NAME="))(.+)(?=(?:" DATE="DATE"))
Or leave out the inner () altogether:
(?<=<CONDITION NAME=")(.+)(?=" DATE="DATE")
Or use $2 as replacement. Everything between standard () becomes a captured group, no matter where in the expression they are.
And be careful with .+, in this case [^"]+ is a much safer choice.

Using a positive lookahead to remove the middle of a string

I'm currently attempting to remove text in the middle of this string:
RenameMe_12345_12365_130706T234502.txt
using the following regex:
^[a-zA-Z]+(?=_[0-9]+_[0-9]+).+$
in an attempt to return:
RenameMe_130706T234502.txt
but the regex returns the entire string without excluding the middle:
RenameMe_12345_12365_130706T234502.txt
Am I using the positive lookahead incorrectly, or am I approaching the problem incorrectly? Can positive lookaheads not be used this way?
replace this regex:
_.*_
with
_
example with sed tool:
kent$ echo RenameMe_12345_12365_130706T234502.txt|sed 's/_.*_/_/'
RenameMe_130706T234502.txt
You could do it with your own tool/programming language.
EDIT for OP's comment:
#CodingUnderDuress _.*_ is a single regex (BRE). It uses the .* greedy character to achieve your goal.
If you don't want to do the substitution, just with regex to match the parts you need, you could do:
(^[^_]*|_[^_]*$)
test with grep: (-E means ERE)
kent$ echo "RenameMe_12345_12365_130706T234502.txt"|grep -Eo '(^[^_]*|_[^_]*$)'
RenameMe
_130706T234502.txt
You can of course use look-behind/ahead, if you really love them. then you need PCRE. And I don't see why we need use look-around here for your requirement.
You can replace the contents of this by a empty character
_(\w+(?=_))*
Working
[1] Match the character `_`
[2] followed a set of word characters
[3] I have used positive look-ahead `?=_` to make sure the last `_` is not missed out
[4] Match the above 0 or more times
Use this
(?<=[^_])_\w+_(?=[^_]+)
to match the part you want to remove.

NOTEPAD++ REGEX - I can't get what's in between two strings, I don't get it

I'm so close to understanding regex. I'm a bit stumped, I thought i understood lazy and greedy.
Here is my current regex: <g_n><!\[CDATA\[([^]]+)(?=]]><\/g_n>)
My current regex makes:
<g_n><![CDATA[xxxxxxxxxx]]></g_n>
match to:
<g_n><![CDATA[xxxxxxxxxx
But I want to make it match like this:
xxxxxxxxxx
You want
<g_n><!\[CDATA\[(.*?)]]></g_n>
then if you want to replace it use
\1
in the replacement box
Your matching the whole string, the brackets around the .*? match all of that and put it in the \1 variable
So the match will be all of the string with \1 referring to what you want
To change the xxxxx
Regex :
(<g_n><![CDATA[)(?:.*?)(]]></g_n>)
Replacement
\1WHAT YOU WANT TO CHANGE TO\2
It looks like you need to add escape slashes to the two closing square brackets, as they are literals from the string you're parsing.
<g_n><!\[CDATA\[.*+?\]\]><\/g_n>
^ ^
Any square brackets not being escaped by backslashes will be treated as regex operational brackets, which in this case won't catch the input string.
EDIT, I think the +? is redundant.
\[.*\]\]> ...
should suffice, since .* means any character, any amount of times.
Tested with notepad++ 6.3.2:
find: (<g_n><!\[CDATA\[)([^]]+)(?=]]></g_n>)
replace: $1WhatYouWant
You can replace + by * in the pattern to match void CDATA:
<g_n><![CDATA[]]></g_n>

vim regex find and replace

I want to use a regex to replace some strings in my file. I search for:
%s/^ [a-z]*/ /
what I want to do is to replace every [a-z]* that have 2 whitespaces with the sane [a-z] prepended with 4 whitespaces. Is there any "inplace" replacement or how would I reach that with vim?
With best regards
:%s/ \([a-z]*\)/ \1/g
should do the job; beware of running this multiple times, though because the result of the replace will match the input pattern :)
I find it more straightforward to use the \ze object to define the end of the match:
:%s/ \ze[a-z]*/ /g
so the [a-z]* is not included in the replace, but just used to match the relevant spaces.