Quite confused about `\?` in vim's regex [duplicate] - regex

This question already has answers here:
How can I make my match non greedy in vim?
(8 answers)
Closed 4 years ago.
I'we been trying to do simple substitution in vim, and find out that the \? in vim not works with * or +, saying that (NFA regexp) Can't have a multi follow a multi, in the vim:
i want it to stop here, not here
~
~
~
[NORMAL] ...
:%s/^\(.*\?\)here//
If I remove \? it works, but the it regex matches up to 2nd here.
But with normal regex it works: https://regex101.com/r/iHdxxl/1
Why it isn't possible to use \? with * or \+ in vim?

As stated there, you can't add the ? char in vim after the asterisk.
To make the search non greedy, you need to use .\{-} instead of .*:
:%s/\(.\{-}\)here//

Another option is to use negative lookahead:
:%s/\v^((here)#!.)* here//
\v is used for very magic to avoid escaping all over in regex.

Related

How to exclude a substring in a regular expression? [duplicate]

This question already has answers here:
What is the difference between .*? and .* regular expressions?
(3 answers)
What do 'lazy' and 'greedy' mean in the context of regular expressions?
(13 answers)
Closed 5 months ago.
There is a line of text:
Lorem ~Ipsum~ is simply ~dummy~ text ~of~ the printing...
To find all the words enclosed in ~~ I use
re.search(r'~([^~]*)~', text)
Let's say it became necessary to use ~~ instead of ~
([^\~]+) indicates to exclude the ~ character from the text within those characters
How do I make a regular expression to exclude a string of characters instead of just one?
That is, ~~Lor~em~~ should return Lor~em
The symbol of the new string must not be excluded and the length of the found string cannot be 0
Use a non-greedy quantifier instead of a negated character set.
re.search(r'~~(.*?)~~', text, flags=re.DOTALL)
re.DOTALL makes . match newline characters.

Parsing regex with escaped pipe delimiter [duplicate]

This question already has answers here:
regular expression to match pipe separated strings with pipe escaping
(4 answers)
Closed 3 years ago.
Im trying to parse
|123|create|item|1497359166334|Sport|Some League|\|Team\| vs \|Team\||1497359216693|
With regex (https://regex101.com/r/KLzIOa/1/)
I currently have
[^|]++
Which is parsing everything correctly except \|Team\| vs \|Team\|
I would expect this to be parsed as |Team| vs |Team|
If i change the regex to
[^\\|]++
It parses the Teams separately instead of together with the escaped pipe
Basically i want to parse the fields between the pipes however, if there are any escaped pipes i would like to capture them. So with my example i would expect
["123", "create", "item", "1497359166334", "Sport", "Some League", "|Team| vs |Team|", "1497359216693"]
You can alternate between:
\\. - A literal backslash followed by anything, or
[^|\\]+ - Anything but a pipe or backslash
(?:\\.|[^|\\]+)+
https://regex101.com/r/KLzIOa/2
Note that there's no need for the possessive quantifier, because no backtracking will occur.
If you also want to replace \|s with |s, then do that afterwards: match \\\| and replace with |.
To handle escaping, you should match a backslash and the character after it as a single "item".
(?:\\.|[^|])++
This conveniently also works for escaping the backslashes themselves!
To then remove the backslashes from the results, use a simple replacement:
Replace: \\(.)
With: $1
Use:
(?:\\\||[^|])+
Demo & explanation

Use RegEx to find and transform characters to capital case [duplicate]

This question already has answers here:
Notepad++ and regex: how to UPPERCASE specific part of a string / find / replace
(2 answers)
Closed 4 years ago.
In notepad++ I need to use RegEx transform all
phone1_id, phone2_id, phone3_id
in
PHONE1_ID, PHONE2_ID, PHONE3_ID
This RegEx helps me find all those strings: phone\d+_id
but how can I transform them to capital case?
Ctrl+H
Find what: phone\d+_id
Replace with: \U$0
check Wrap around
check Regular expression
Replace all
Replacement:
\U : Change to uppercase
$0 : contains the whole match
Result for given example:
PHONE1_ID, PHONE2_ID, PHONE3_ID

\w doesn't work in vim search replace but a-zA-Z does? [duplicate]

This question already has answers here:
Vim regex with metacharacters inside bracket
(3 answers)
Closed 4 years ago.
tldr
[a-zA-Z\.-] works in Vim regex search replace, but [\w\.-] does not.
The text I'm searching:
1 string.here blah blah
24 another-string.here blah.
1523 another-string.goes.here. blah123
Desired output
string.here
another-string.here
another-string.goes.here
My Question
Why does this work:
:%s/\v^\d+\s+([a-zA-z\.-]+)\s+.*/\1/g
But this does not:
:%s/\v^\d+\s+([\w\.-]+)\s+.*/\1/g
E486: Pattern not found :%s/\v^\d+\s+([\w\.-]+)\s+.*/\1/g
The only difference between the two is a-zA-Z vs \w inside square brackets. But doesn't \w equal a-zA-Z (plus some other non-whitespace characters not in this example text)?
I'm using default vim. Unmodified. Whatever comes with Ubuntu.
Non-vim platforms
When I try with the atom text editor instead of vim, both expressions work.
Search: ^\d+\s+([a-zA-z\.-]+)\s+.*
Replace: $1
When I try with RegExr both expressions work. (Although I have to add the multiline tag)
Other things I've tried
My understanding is that \v is necessary for avoiding escaping hell. I've tried without it:
:%s/^\d\+\s\+\([a-zA-Z\.-]\+\)\s\+.*/\1/g
works
:%s/^\d\+\s\+\([\w\.-]\+\)\s\+.*/\1/g
does not work. ("Pattern not found")
I've also tried adding the m flag (so the end is /gm) but that didn't work
E488: Trailing characters
I've also tried without the ^.
:%s/\d\+\s\+\([\w\.-]\+\)\s\+.*/\1/g
E486: Pattern not found: \d\+\s\+\([\w\.-]\+\)\s\+.*
I've also tried using \\w instead of \w.
:%s/\d\+\s\+\([\\w\.-]\+\)\s\+.*/\1/g
E486: Pattern not found: \d\+\s\+\([\\w\.-]\+\)\s\+.*
I've also tried using \[ \] instead of [ ].
:%s/\d\+\s\+\(\[\\w\.-\]\+\)\s\+.*/\1/g
E486: Pattern not found: \d\+\s\+\(\[\\w\.-\]\+\)\s\+.*
[a-zA-Z\.-] works in Vim regex search replace, but [\w\.-] does not.
[a-zA-Z\.-] is a collection of characters containing:
every character from a to z,
every character from A to Z,
the character .,
and the character -.
:help /collection is regrettably not explicit about this but character classes like \w are interpreted as "escaped w", and thus "plain w", so [\w\.-] is really just [w\.-] which is not what you want:
the character w,
the character .,
and the character -.

Prettier auto "correct" regex escaping forward slash `\` [duplicate]

This question already has answers here:
Why do regex constructors need to be double escaped?
(5 answers)
Closed 5 years ago.
pattern: '^131\.[0-9]{6}$',
prettier change it to pattern: '^131.[0-9]{6}$',. Is there a way to ignore line, or ignore file?
Assuming JavaScript (as you're using prettier.) The '^131\.[0-9]{6}$' is just a string, not a regex. Prettier removes unnecessary escape characters when reformatting. As \. isn't a meaningful escape, it's the same as just having . on its own in string context.
Your aim is to get \. into a regex, which I assume you're going to create using the new RegExp() constructor; in that case you want to escape the backslash:
pattern: '^131\\.[0-9]{6}$'