Find multiple occurrences of a character after another character - regex

I need to find and replace multiple occurrences of a character after another character.
My file looks like this:
b
a
b
b
And I need to replace all b after a with c:
b
a
c
c
I came up with this: a((\n|.)*)b as the find expression and a$1c as the replace option, however it only replaces the last match instead of all of them.
I am using VSCode's global search and replace option.
I found a dirty way to achieve what I want: I add a ? lazy quantifier after .* matches once, and I apply the replacement. Then I can do it again and it will replace the next match. I do this until all occurrences are replaced.
However this would not be usable if there are thousands of matchs, and it would be very interesting to know if there is a proper way to do it, with only 1 find.
How can I match all b after a?

You can use
(?<=a[\w\W]*?)b
Replace with c. Details:
(?<=a[\w\W]*?) - a positive lookbehind that matches a location that is immediately preceded with a and then any zero or more chars (as few as possible)
b - a b.
Also, see Multi-line regular expressions in Visual Studio Code for more ways to match any char across lines.
Demo:
After replacing:
If you need to use something like this to replace in multiple files, you need to know that the Rust regex used in the file search and replace VSCode feature is really much less powerful and does not support neither \K, nor \G, nor infinite-width lookbehinds. I suggest using Notepad++ Replace in Files feature:
The (?:\G(?!\A(?<!(?s:.)))|a)[^b]*\Kb pattern matches
(?:\G(?!\A(?<!(?s:.)))|a) - either of the two options:
\G(?!\A(?<!(?s:.))) - the end of the previous successful match ((?!\A(?<!(?s:.))) is necessary to exclude the start of file position from \G)
| - or
a - an a
[^b]* - any zero or more occurrences of chars other than b
\K - omit the matched text
b - a b char.

It's probably not the prettiest, but when tried and tested the following worked for me:
(?:^a\n|\G(?<!\A))\n*\Kb$
See the online demo. I don't know VSCode but a quick search let me to believe it should follow Perl based PCRE2 syntax as per the linked demo.
(?: - Open non-capture group:
^a\n - Start line anchor followed by "a" and a newline character.
| - Or:
\G(?<!\A) - Meta escape, assert position at end of previous match or start of string. The negative lookbehind prevents the start of string position to be matched.
) - Close non-capture group.
\n* - 0+ new-line characters.
\K - Meta escape, reset starting point of reported match.
b$ - Match a literal "b", followed by an end-line anchor.

Related

Regex: delete everything between String and replace with other

So I've been scratching my head over this one, I have over a thousand files that have different values between the strings
<lodDistances content="float_array">
15.000000
25.000000
70.000000
140.000000
500.000000
500.000000
</lodDistances>
I need to replace those values with these
<lodDistances content="float_array">
120.000000
200.000000
300.000000
400.000000
500.000000
550.000000
</lodDistances>
I tried the following without any success
\ (?<=\<lodDistances content\=\"float_array\"\>)(.*)(?=\<\/lodDistances\>)
It seems to find it in regexr but not in a sublime text when I try to find it in files, I constantly get 0 results. Any idea why this is happening?
There are a couple of things that are wrong in your pattern:
\< matches a leading word boundary position (as \b(?=\w)) and \> matches the trailing word boundary position (same as \b(?<=\w)). You wanted to match literal < and > chars, thus, you must NOT escape them
There is no need matching a space before the first <
Since you text is multiline, use either (?s) inline modifier or (?s:...) modifier group to make . match across line breaks, or use a [\s\S] / [\w\W] / [\d\D] workaround
Use a lazy dot pattern to stop matching at first occurrence of the trailing delimiter.
You may use
(?s)(<lodDistances content="float_array">\s*).*?(?=\s*</lodDistances>)
And replace with ${1}<new values>. The curly braces are necessary as the new values are most likely numbers and without the braces, $1n (n stands for a digit here) will be parsed incorrectly (see this YT video for a demo of what it is fraught with).
See the demo below:
V
Regex details:
(?s) - now, . matches line break chars, too
(<lodDistances content="float_array">\s*) - Group 1 capturing <lodDistances content="float_array"> text and then zero or more whitespaces
.*? - any zero or more chars, but as few as possible
(?=\s*</lodDistances>) - a positive lookahead that matches the location that is immediately followed with zero or more whitespaces and </lodDistances> text.
Note that / is not a special regex metacharacter, and since regex delimiter notation is not supported in Sublime Text, you do not have to ever escape it here.

Improve regex for capturing files in a directory, excluding dotfiles

I am looking to get all non dot-files in a folder with a particular extension. So far my regex is:
(?<=\/|^)(?<!\.)(\w+(?:\.mov|\.py|))$
Is there a way to improve the above regex? What might be some examples where this regex might not work?
The \w+ will only match one or more letters, digits or _. It will not match the rest of the chars that may constitute a valid file name. Also, your (?<!\.) lookbehind is redundant because the previous lookbehind already excludes a dot at that position.
Besides, you do not have to repeat the comma pattern, you may use grouping for extensions only.
You may use
(?<=\/|^)([^\/]+)(\.(?:mov|py))$
See this regex demo
(?<=\/|^) - / or start of string allowed immediately on the left
([^\/]+) - Group 1: any one or more chars other than /
(\.(?:mov|py)) - Group 2: a . char and then either mov or py
$ - end of string/
Note you may also replace (?<=\/|^) with (?<![^\/]) in real code since it will work the same with standalone strings. It will mess the demo results at regex101.com because there, you test against a single multiline string (that is why I added \n to the negated character class there, too).
Here's how I would do it:
(?<=\/|^)[^\/\\:*?"<>|\n]+\.(?:mov|py)$
(?<=\/|^) Lookbehind just like you had it
[^\/\\:*?"<>|\n]+ One or more of any character that is not disallowed in filenames
\. A literal dot
(?:mov|py) Either "mov" or "py" in a non-capturing group (similar to yours, but I moved the dot out and excluded the redundant "|")
$ Anchors the search to the end of the line, so only files will match, no folders

match but don't select with regex

I'm trying to capture the term ISomething only when it isn't immediately preceded by a full stop . however my search is capturing the preceding letter or space. I'm using javascript style regex (vscode search to be exact).
My aim is to replace ISomething with Namespace.ISomething without touching existing namespaces.
Live example
My search sample
Api.Resources.Things.Bits.ISomething //doesn't match
something : ISomething
List<ISomething>
something:ISomething
isomething //doesn't match
My regex
[^\.](ISomething)
My matches, the first captures the whitespace, the second the arrow, third the bracket.
ISomething
<ISomething
:ISomething
How (and why) can i just get the word ISomething in all of the above?
UPDATE
You can use infinite-width lookahead and lookbehind without any constraint beginning with Visual Studio Code v.1.31.0 release, and you do not need to set any options for that now.
So, the solution can look like
Find what: \b(?<!\.)ISomething\b
Replace with: Namespace.$&
The (?<!\.) must be after \b for better performance (in order not to perform lookbehind check at each position in a string) and is a negative lookbehind that matches a position that is not immediately preceded with a literal .. The $& in the replacement is a whole match value backreference/placeholder.
With older versions you may use
Find what: (^|[^.])\b(ISomething)\b
Replace with: $1Namespace.$2
See the regex demo and the VSCode settings below:
NOTE that the Aa (case sensitivity) and .* (regex mode) options must be ON.
After clicking Replace all, the results are:
Regex details
(^|[^.]) - Group 1: either the start of the line/string or any char other than .
\b - a word boundary
(ISomething) - Group 2: the word ISomething
\b - a word boundary
if supported \K may be what you are looking for:
[^\.]\KISomething
You could use a negative lookbehind
(?<!\.)ISomething basically this will match any ISomething that is not preceded by a .

Use regex to search for a substring in source code files

I've got to rename our application and would like to search all strings in the source code for the use of it. Naturally the app name can appear anywhere within the strings and the strings can span multiple lines which complicates things.
I was using (["'])APP_NAME to find instances at the start of strings but now I need a more complete solution.
Essentially what I'd like to say is "find instances of APP_NAME enclosed by quotes" in regex speak.
I'm searching in Xcode in case anyone has any Xcode-specific alternatives...
You may use
"[^"]*APP_NAME[^"]*"|'[^']*APP_NAME[^']*'
See the regex demo.
Note that this regex is based on alternation (| means OR) and negated character classes ([^"]* matches any 0+ chars other than ").
Or, alternatively:
(["'])(?:(?!\1).)*APP_NAME.*?\1
See this regex demo. The pattern is a bit trickier:
(["']) - captures " or ' into Group 1
(?:(?!\1).)* - any 0+ occurrences of a char that is not equal to the one captured into Group 1
APP_NAME - literal char sequence
.*? - any 0+ chars other than line break chars but as few as possible`up to the first occurrence of...
\1 - the value captured into Group 1.

How to combine lines in regular expressions?

So i am new to regular expressions and i am learning them using a simple text editor only. I have the following file
84544484N
32343545M
32334546E
34456434M
I am trying to combine each pair of lines into one tab delimited line
The result should be :
84544484N 32343545M
32334546E 34456434M
I wrote the following :
Search: (.*?)\n(.*?)
Replace: \1\t\2
this did not work can someone please explain why and give me the correct solution. Thank you!!
The (.*?)\n(.*?) pattern will never work well because the (.*?) at the end of the pattern will always return an empty string (since *? is a lazy matching quantifier and if it can return zero characters (and it can) it will. Use greedy matching and adjust the pattern like:
(.+)\r?\n *(.*)
or - since SublimeText uses Boost regex - you can match any newline sequence with \R:
(.+)\R *(.*)
and replace with \1\t\2. Note I replaced *? with + in the first capturing group because you need to match non-empty lines.
Regex breakdown:
(.+) - one or more characters other than a newline (as many as possible) up to
\R - a newline sequence (\r\n, \r or just \n)
* - a literal space, zero or more occurrences
(.*) - Group 2: zero or more characters other than a newline (as many as possible)
/