RegEx match sequence of three strings along with text inbetween - regex

I'm trying to get a regular expression to match something inbetween two strings that includes a third. I'm having trouble getting the lazy quantifier to cooperate, as there are multiple instances of these strings in the input and the RegEx matches something that is not useful, i.e.:
Start...End...Start...End...Start...Middle...End
Whet I'm actually looking for (only one instance of Start and End for each match):
Start...Middle...End or Start...Center...End
I'm pretty sure I need to use lookahead/lookbehind, but while I do conceptually understand them, putting them into practice is really difficult. Here's where I'm at:
/<Start[\s\S]*?(Middle|Center)[\s\S]*?End>/gm

Make use of the tempered greedy token:
Start(?:(?!Start|End)[\s\S])*?(Middle|Center)[\s\S]*?End
^^^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo
Details
Start - a literal string
(?:(?!Start|End)[\s\S])*? - any char, 0+ repetitions, as few as possible, that is not a starting point of Start or End sequence
(Middle|Center) - Group 1: Middle or Center
[\s\S]*? - any 0+ chars, as few as possible
End - a literal string

Related

How to match characters between two occurrences of the same but random string

Base string looks like:
repeatedRandomStr ABCXYZ /an/arbitrary/##-~/sequence/of_characters=I+WANT+TO+MATCH/repeatedRandomStr/the/rest/of/strings.etc
The things I know about this base string are:
ABCXYZ is constant and always present.
repeatedRandomStr is random, but its first occurrence is always at the beginning and before ABCXYZ
So far I looked at regex context matching, recursion and subroutines but couldn't come up with a solution myself.
My currently working solution is to first determine what repeatedRandomStr is with:
^(.*)\sABCXYZ
and then use:
repeatedRandomStr\sABCXYZ\s(.*)\srepeatedRandomStr
to match what I want in $1. But this requires two separate regex queries. I want to know if this can be done in a single execution.
In Go, where RE2 library is used, there is no way other than yours: keep extracting the value before the ABCXYZ and then use the regex to match a string between two strings, as RE2 does not and won't support backreferences.
In case the regex flavor can be switched to PCRE or compatible, you can use
^(.*?)\s+ABCXYZ\s(.*)\1
^(.*?)\s+ABCXYZ\s(.*?)\1
See the regex demo.
Details:
^ - start of string
(.*?) - Group 1: zero or more chars other than line break chars as few as possible
\s+ - one or more whitespaces
ABCXYZ - some constant string
\s - a whitespace
(.*) - Group 2: zero or more chars other than line break chars as many as possible
\1 - the same value as in Group 1.

How to exclude a specific string with REGEX? (Perl)

For example, I have these strings
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLETEA1B
APPLEWINE3B
APPLEWINE1C
I want all of these strings except those that have TEA or WINE1C in them.
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLEWINE3B
I've already tried the following, but it didn't work:
^APPLE(?!.*(?:TEA|WINE1C)).*$
Any help is appreciated as I'm also kinda new to this.
If you indeed have mutliple strings as you claim, there's no need to jam all that in one regex pattern.
/^APPLE/ && !/TEA|WINE1C/
If you have a single string, the best approach is probably to splice it into lines (split /\n/), but you could also use a single regex match too
/^APPLE(?!.*TEA|WINE1C).*/mg
You can use
^APPLE(?!.*TEA)(?!.*WINE1C).*
See the regex demo.
Details:
^ - start of string
APPLE - a fixed string
(?!.*TEA) - no TEA allowed anywhere to the right of the current location
(?!.*WINE1C) - no WINE1C allowed anywhere to the right of the current location
.* - any zero or more chars other than line break chars as many as possible.
If you don't want to match a string that has both or them (which is not in the current example data):
^APPLE(?!.*(WINE1C|TEA).*(?!\1)(?:TEA|WINE1C)).*
Explanation
^ Start of string
APPLE match literally
(?! Negative lookahead
.*(WINE1C|TEA) Capture either one of the values in group 1
.* Match 0+ characters
(?!\1)(?:TEA|WINE1C) Match either one of the values as long as it is not the same as previously matched in group 1
) Close the lookahead
.* Match the rest of the line
Regex demo

Use regex to search for a substring in source code files

I've got to rename our application and would like to search all strings in the source code for the use of it. Naturally the app name can appear anywhere within the strings and the strings can span multiple lines which complicates things.
I was using (["'])APP_NAME to find instances at the start of strings but now I need a more complete solution.
Essentially what I'd like to say is "find instances of APP_NAME enclosed by quotes" in regex speak.
I'm searching in Xcode in case anyone has any Xcode-specific alternatives...
You may use
"[^"]*APP_NAME[^"]*"|'[^']*APP_NAME[^']*'
See the regex demo.
Note that this regex is based on alternation (| means OR) and negated character classes ([^"]* matches any 0+ chars other than ").
Or, alternatively:
(["'])(?:(?!\1).)*APP_NAME.*?\1
See this regex demo. The pattern is a bit trickier:
(["']) - captures " or ' into Group 1
(?:(?!\1).)* - any 0+ occurrences of a char that is not equal to the one captured into Group 1
APP_NAME - literal char sequence
.*? - any 0+ chars other than line break chars but as few as possible`up to the first occurrence of...
\1 - the value captured into Group 1.

How to only match a single instance of a character?

Not quite sure how to go about this, but basically what I want to do is match a character, say a for example. In this case all of the following would not contain matches (i.e. I don't want to match them):
aa
aaa
fooaaxyz
Whereas the following would:
a (obviously)
fooaxyz (this would only match the letter a part)
My knowledge of RegEx is not great, so I am not even sure if this is possible. Basically what I want to do is match any single a that has any other non a character around it (except for the start and end of the string).
Basically what I want to do is match any single a that has any other non a character around it (except for the start and end of the string).
^[^\sa]*\Ka(?=[^\sa]*$)
DEMO
\K discards the previously matched characters and lookahead assertes whether a match is possibel or not. So the above matches only the letter a which satifies the conditions.
OR
a{2,}(*SKIP)(*F)|a
DEMO
You may use a combination of a lookbehind and a lookahead:
(?<!a)a(?!a)
See the regex demo and the regex graph:
Details
(?<!a) - a negative lookbehind that fails the match if, immediately to the left of the current location, there is a a char
a - an a char
(?!a) - a negative lookahead that fails the match if, immediately to the right of the current location, there is a a char.
You need two things:
a negated character class: [^a] (all except "a")
anchors (^ and $) to ensure that the limits of the string are reached (in other words, that the pattern matches the whole string and not only a substring):
Result:
^[^a]*a[^a]*$
Once you know there is only one "a", you can use the way you want to extract/replace/remove it depending of the language you use.

Regular expression to match non-integer values in a string

I want to match the following rules:
One dash is allowed at the start of a number.
Only values between 0 and 9 should be allowed.
I currently have the following regex pattern, I'm matching the inverse so that I can thrown an exception upon finding a match that doesn't follow the rules:
[^-0-9]
The downside to this pattern is that it works for all cases except a hyphen in the middle of the String will still pass. For example:
"-2304923" is allowed correctly but "9234-342" is also allowed and shouldn't be.
Please let me know what I can do to specify the first character as [^-0-9] and the rest as [^0-9]. Thanks!
This regex will work for you:
^-?\d+$
Explanation: start the string ^, then - but optional (?), the digit \d repeated few times (+), and string must finish here $.
You can do this:
(?:^|\s)(-?\d+)(?:["'\s]|$)
^^^^^ non capturing group for start of line or space
^^^^^ capture number
^^^^^^^^^ non capturing group for end of line, space or quote
See it work
This will capture all strings of numbers in a line with an optional hyphen in front.
-2304923" "9234-342" 1234 -1234
++++++++ captured
^^^^^^^^ NOT captured
++++ captured
+++++ captured
I don't understand how your pattern - [^-0-9] is matching those strings you are talking about. That pattern is just the opposite of what you want. You have simply negated the character class by using caret(^) at the beginning. So, this pattern would match anything except the hyphen and the digits.
Anyways, for your requirement, first you need to match one hyphen at the beginning. So, just keep it outside the character class. And then to match any number of digits later on, you can use [0-9]+ or \d+.
So, your pattern to match the required format should be:
-[0-9]+ // or -\d+
The above regex is used to find the pattern in some large string. If you want the entire string to match this pattern, then you can add anchors at the ends of the regex: -
^-[0-9]+$
For a regular expression like this, it's sometimes helpful to think of it in terms of two cases.
Is the first character messed up somehow?
If not, are any of the other characters messed up somehow?
Combine these with |
(^[^-0-9]|^.+?[^0-9])