Use regex to search for a substring in source code files - regex

I've got to rename our application and would like to search all strings in the source code for the use of it. Naturally the app name can appear anywhere within the strings and the strings can span multiple lines which complicates things.
I was using (["'])APP_NAME to find instances at the start of strings but now I need a more complete solution.
Essentially what I'd like to say is "find instances of APP_NAME enclosed by quotes" in regex speak.
I'm searching in Xcode in case anyone has any Xcode-specific alternatives...

You may use
"[^"]*APP_NAME[^"]*"|'[^']*APP_NAME[^']*'
See the regex demo.
Note that this regex is based on alternation (| means OR) and negated character classes ([^"]* matches any 0+ chars other than ").
Or, alternatively:
(["'])(?:(?!\1).)*APP_NAME.*?\1
See this regex demo. The pattern is a bit trickier:
(["']) - captures " or ' into Group 1
(?:(?!\1).)* - any 0+ occurrences of a char that is not equal to the one captured into Group 1
APP_NAME - literal char sequence
.*? - any 0+ chars other than line break chars but as few as possible`up to the first occurrence of...
\1 - the value captured into Group 1.

Related

How to match characters between two occurrences of the same but random string

Base string looks like:
repeatedRandomStr ABCXYZ /an/arbitrary/##-~/sequence/of_characters=I+WANT+TO+MATCH/repeatedRandomStr/the/rest/of/strings.etc
The things I know about this base string are:
ABCXYZ is constant and always present.
repeatedRandomStr is random, but its first occurrence is always at the beginning and before ABCXYZ
So far I looked at regex context matching, recursion and subroutines but couldn't come up with a solution myself.
My currently working solution is to first determine what repeatedRandomStr is with:
^(.*)\sABCXYZ
and then use:
repeatedRandomStr\sABCXYZ\s(.*)\srepeatedRandomStr
to match what I want in $1. But this requires two separate regex queries. I want to know if this can be done in a single execution.
In Go, where RE2 library is used, there is no way other than yours: keep extracting the value before the ABCXYZ and then use the regex to match a string between two strings, as RE2 does not and won't support backreferences.
In case the regex flavor can be switched to PCRE or compatible, you can use
^(.*?)\s+ABCXYZ\s(.*)\1
^(.*?)\s+ABCXYZ\s(.*?)\1
See the regex demo.
Details:
^ - start of string
(.*?) - Group 1: zero or more chars other than line break chars as few as possible
\s+ - one or more whitespaces
ABCXYZ - some constant string
\s - a whitespace
(.*) - Group 2: zero or more chars other than line break chars as many as possible
\1 - the same value as in Group 1.

How to exclude a specific string with REGEX? (Perl)

For example, I have these strings
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLETEA1B
APPLEWINE3B
APPLEWINE1C
I want all of these strings except those that have TEA or WINE1C in them.
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLEWINE3B
I've already tried the following, but it didn't work:
^APPLE(?!.*(?:TEA|WINE1C)).*$
Any help is appreciated as I'm also kinda new to this.
If you indeed have mutliple strings as you claim, there's no need to jam all that in one regex pattern.
/^APPLE/ && !/TEA|WINE1C/
If you have a single string, the best approach is probably to splice it into lines (split /\n/), but you could also use a single regex match too
/^APPLE(?!.*TEA|WINE1C).*/mg
You can use
^APPLE(?!.*TEA)(?!.*WINE1C).*
See the regex demo.
Details:
^ - start of string
APPLE - a fixed string
(?!.*TEA) - no TEA allowed anywhere to the right of the current location
(?!.*WINE1C) - no WINE1C allowed anywhere to the right of the current location
.* - any zero or more chars other than line break chars as many as possible.
If you don't want to match a string that has both or them (which is not in the current example data):
^APPLE(?!.*(WINE1C|TEA).*(?!\1)(?:TEA|WINE1C)).*
Explanation
^ Start of string
APPLE match literally
(?! Negative lookahead
.*(WINE1C|TEA) Capture either one of the values in group 1
.* Match 0+ characters
(?!\1)(?:TEA|WINE1C) Match either one of the values as long as it is not the same as previously matched in group 1
) Close the lookahead
.* Match the rest of the line
Regex demo

Find multiple occurrences of a character after another character

I need to find and replace multiple occurrences of a character after another character.
My file looks like this:
b
a
b
b
And I need to replace all b after a with c:
b
a
c
c
I came up with this: a((\n|.)*)b as the find expression and a$1c as the replace option, however it only replaces the last match instead of all of them.
I am using VSCode's global search and replace option.
I found a dirty way to achieve what I want: I add a ? lazy quantifier after .* matches once, and I apply the replacement. Then I can do it again and it will replace the next match. I do this until all occurrences are replaced.
However this would not be usable if there are thousands of matchs, and it would be very interesting to know if there is a proper way to do it, with only 1 find.
How can I match all b after a?
You can use
(?<=a[\w\W]*?)b
Replace with c. Details:
(?<=a[\w\W]*?) - a positive lookbehind that matches a location that is immediately preceded with a and then any zero or more chars (as few as possible)
b - a b.
Also, see Multi-line regular expressions in Visual Studio Code for more ways to match any char across lines.
Demo:
After replacing:
If you need to use something like this to replace in multiple files, you need to know that the Rust regex used in the file search and replace VSCode feature is really much less powerful and does not support neither \K, nor \G, nor infinite-width lookbehinds. I suggest using Notepad++ Replace in Files feature:
The (?:\G(?!\A(?<!(?s:.)))|a)[^b]*\Kb pattern matches
(?:\G(?!\A(?<!(?s:.)))|a) - either of the two options:
\G(?!\A(?<!(?s:.))) - the end of the previous successful match ((?!\A(?<!(?s:.))) is necessary to exclude the start of file position from \G)
| - or
a - an a
[^b]* - any zero or more occurrences of chars other than b
\K - omit the matched text
b - a b char.
It's probably not the prettiest, but when tried and tested the following worked for me:
(?:^a\n|\G(?<!\A))\n*\Kb$
See the online demo. I don't know VSCode but a quick search let me to believe it should follow Perl based PCRE2 syntax as per the linked demo.
(?: - Open non-capture group:
^a\n - Start line anchor followed by "a" and a newline character.
| - Or:
\G(?<!\A) - Meta escape, assert position at end of previous match or start of string. The negative lookbehind prevents the start of string position to be matched.
) - Close non-capture group.
\n* - 0+ new-line characters.
\K - Meta escape, reset starting point of reported match.
b$ - Match a literal "b", followed by an end-line anchor.

Use regular expressions in Visual Studio to match (non-consecutive) and replace recurring string in an expression

I am tasked to refactor namespaces in vs2015 Solution, removing duplicate/repeating words.
I need a FIND regex that returns these namespaces and everywhere that may have been used or referenced.
I need replace regex to remove the second occurrence of the word from namespace.
EXAMPLE
TestApp.SA.TestApp => TestApp.SA
TestApp.TestApp.SA => TestApp.SA
Here is my regex to Find(which I know can be better) : TestApp.*?(TestApp)
Somebody please help with an expression for replace, which I think is to set the second occurrence of TestApp to whiteSpace ?
The patterns I will suggest are not a 100% safe solution, but will show you a way to use regex for search and search and replace in your files.
The basic expressions you may use for the task are
(\w+)\.(\w+\.)*\1
and
Find: (\w+)((?:\.\w+)*)\.\1
Replace: $1$2
See the regex demo
The patterns mean:
(\w+) - match and capture 1+ alphanumeric/underscore chars into Group 1
\. - matches a literal dot
(\w+\.)* - zero or more sequences ((...)*) of 1+ word chars followed with a dot (each subsequent submatch will erase the Group 2 buffer, but it is not important when just searching)
\1 - a backreference to the contents captured in Group 1
The second pattern is almost the same, just the capturing groups are a bit adjusted for the replacement numbered backreferences to replace text correctly.

Regex with exclusion chars and another regex

How to write regex which find word (without whitespace) that doesn't contain some chars (like * or #) and sentence also (like level10 or level2 - it should be also regex - level[0-9]+). It will be simple for chars excluded ([^\\s^#^*]+) but how to exclude this 'level' example too ?
I want to exclude chars AND level with number.
Examples:
weesdlevel3fv - shouldn't match because of 'level3'
we3rlevelw4erw - should match - there is level without number
dfs3leveldfvws#3vd - shouldn't match - level is good, but '#' char appeared
level4#level levelw4_level - threat as two words because of whitespaces - only second one should match - no levels with number and no restricted chars like '#' or '*'
See this regex:
/(?<=\s)(?!\S*[#*])(?!\S*level[0-9])\S+/
Regex explanation:
(?<=\s) Asserts position after a whitespace sequence.
(?!\S*[#*]) Asserts that "#" or "*" is absent in the sequence.
(?!\S*level[0-9]) Asserts that level[0-9] is not matched in the sequence.
\S+Now that our conditionals pass, this sequence is valid. Go ahead and use \S+ or \S++ to match the entire sequence.
To use lookaheads more exclusively, you can add another (?!\S*<false_to_assert>) group.
View a regex demo!
For this specific case you can use a double negation trick:
/(?<=\s)(?!\S*level[0-9])[^\s#*]+(?=\s)/
Another regex demo.
Read more:
Regex for existence of some words whose order doesn't matter
you can simply OR the searches with the pipe character
[^\s#*]+|level[0-9]+