Regex extracting string after last hyphen and spaces - regex

Which regex needs to be used to extract 'Manchester City' from string.
String is:
Aston Villa - Manchester City
I tried -(.*)\w|-(.), but it grabs - .

Note that -(.*)\w|-(.) matches - since both the alternatives here start with matching a hyphen. You can usually check if something is present or not with a lookaround.
However, in this case, I'd suggest
-\s*\K[^-]+$
Since you need to only match the substring after the last - with spaces trimmed off, you need something like a negative infinite width lookbehind (?<=-\s*). However, in PCRE, infinite width lookbehind is not supported. Instead, there is a \K operator that makes the engine omit the whole match that was grabbed so far by the current pattern.
See a regex demo
Breakdown:
- - a literal hyphen
\s* - zero or more whitespace characters
\K - operator that resets (empties) all currently kept match buffer
[^-]+ - one or more characters other than - up to ...
$ - the end of the string.

The simplest is[code] . *- (. *) [/code] and your data is in $1 or \1 or something else that depends on your tool. That assume that data are in format xxxxx-xxxxxx

Another simple option is - (.*) see: https://regex101.com/r/fY3oE7/1. Use the first capturing group in your language to get the part after the dash.

Related

How to match characters between two occurrences of the same but random string

Base string looks like:
repeatedRandomStr ABCXYZ /an/arbitrary/##-~/sequence/of_characters=I+WANT+TO+MATCH/repeatedRandomStr/the/rest/of/strings.etc
The things I know about this base string are:
ABCXYZ is constant and always present.
repeatedRandomStr is random, but its first occurrence is always at the beginning and before ABCXYZ
So far I looked at regex context matching, recursion and subroutines but couldn't come up with a solution myself.
My currently working solution is to first determine what repeatedRandomStr is with:
^(.*)\sABCXYZ
and then use:
repeatedRandomStr\sABCXYZ\s(.*)\srepeatedRandomStr
to match what I want in $1. But this requires two separate regex queries. I want to know if this can be done in a single execution.
In Go, where RE2 library is used, there is no way other than yours: keep extracting the value before the ABCXYZ and then use the regex to match a string between two strings, as RE2 does not and won't support backreferences.
In case the regex flavor can be switched to PCRE or compatible, you can use
^(.*?)\s+ABCXYZ\s(.*)\1
^(.*?)\s+ABCXYZ\s(.*?)\1
See the regex demo.
Details:
^ - start of string
(.*?) - Group 1: zero or more chars other than line break chars as few as possible
\s+ - one or more whitespaces
ABCXYZ - some constant string
\s - a whitespace
(.*) - Group 2: zero or more chars other than line break chars as many as possible
\1 - the same value as in Group 1.

Regex Expressions in Dart/Flutter

I am developing an app with markdown capabilities, so I am building a lexer to handle this. I am fairly new to Flutter and have little experience with Regex in general.
Essentially there is a difference between *text*, **text**, and ***text***.
My expressions right now are:
r"\B\*[A-Za-z0-9 ]+\*\B"
r"\B\*{2}[A-Za-z0-9 ]+\*{2}\B"
r"\B\*{3}[A-Za-z0-9 ]+\*{3}\B"
The issue is that the first expression is matching the other two. **text*** will get matched also with the second expression. Does anyone know how to solve this?
It looks like you could use:
(?<!\S)(\*{1,3})[A-Za-z0-9 ]+\1(?!\S)
See an online demo
(?<!\S) - Assert position is not preceded by anything that is not a whitespace char;
(\*{1,3}) - Match 1-3 asterisk characters;
[A-Za-z0-9 ]+ - Match 1+ characters from given character class;
\1 - Backreference what is matched in 1st group;
(?!\S) - Assert position is not followed by anything other than whitespace char.
Note that if you'd remove the final negative lookahead you could also match **text** in **test*** if that is what you were after. Or even remove the leading negative lookbehind to match **text** in ****text** test

How to exclude a specific string with REGEX? (Perl)

For example, I have these strings
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLETEA1B
APPLEWINE3B
APPLEWINE1C
I want all of these strings except those that have TEA or WINE1C in them.
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLEWINE3B
I've already tried the following, but it didn't work:
^APPLE(?!.*(?:TEA|WINE1C)).*$
Any help is appreciated as I'm also kinda new to this.
If you indeed have mutliple strings as you claim, there's no need to jam all that in one regex pattern.
/^APPLE/ && !/TEA|WINE1C/
If you have a single string, the best approach is probably to splice it into lines (split /\n/), but you could also use a single regex match too
/^APPLE(?!.*TEA|WINE1C).*/mg
You can use
^APPLE(?!.*TEA)(?!.*WINE1C).*
See the regex demo.
Details:
^ - start of string
APPLE - a fixed string
(?!.*TEA) - no TEA allowed anywhere to the right of the current location
(?!.*WINE1C) - no WINE1C allowed anywhere to the right of the current location
.* - any zero or more chars other than line break chars as many as possible.
If you don't want to match a string that has both or them (which is not in the current example data):
^APPLE(?!.*(WINE1C|TEA).*(?!\1)(?:TEA|WINE1C)).*
Explanation
^ Start of string
APPLE match literally
(?! Negative lookahead
.*(WINE1C|TEA) Capture either one of the values in group 1
.* Match 0+ characters
(?!\1)(?:TEA|WINE1C) Match either one of the values as long as it is not the same as previously matched in group 1
) Close the lookahead
.* Match the rest of the line
Regex demo

Regex Extraction - Match before a space, or NOT before a space

Here are my potential inputs:
brian#muck.co, brian#gmail.com
brian#gmail.com, brian#muck.co
What I want to do is extract the #muck.co email address.
What I have tried is:
\s.*#muck.co
The problem is that this only grabs an email address if it is preceded by a space (so it would only match the second example input above). . . How would I write a Regex expression to match either inputs?
\s matches for a space, so you should wanted to use something like [^\s]*#muck.co - this means any number of not space caracters. [] - for a set of symbols, ^ - for negate effect.
It does not work for me, because \s in my regex flavour seems to not contain regular space, but this works [^[:space:]]\+#muck\.co. Also \+ instead of * for one or more non-space characters instead of any number and escape dot \. which unescaped stands for any single character.
You can use a negated character class to not cross the # and use either a word boundary at the end to prevent a partial word match:
[^\s#]+#muck\.co\b
Regex demo

Regex: delete everything between String and replace with other

So I've been scratching my head over this one, I have over a thousand files that have different values between the strings
<lodDistances content="float_array">
15.000000
25.000000
70.000000
140.000000
500.000000
500.000000
</lodDistances>
I need to replace those values with these
<lodDistances content="float_array">
120.000000
200.000000
300.000000
400.000000
500.000000
550.000000
</lodDistances>
I tried the following without any success
\ (?<=\<lodDistances content\=\"float_array\"\>)(.*)(?=\<\/lodDistances\>)
It seems to find it in regexr but not in a sublime text when I try to find it in files, I constantly get 0 results. Any idea why this is happening?
There are a couple of things that are wrong in your pattern:
\< matches a leading word boundary position (as \b(?=\w)) and \> matches the trailing word boundary position (same as \b(?<=\w)). You wanted to match literal < and > chars, thus, you must NOT escape them
There is no need matching a space before the first <
Since you text is multiline, use either (?s) inline modifier or (?s:...) modifier group to make . match across line breaks, or use a [\s\S] / [\w\W] / [\d\D] workaround
Use a lazy dot pattern to stop matching at first occurrence of the trailing delimiter.
You may use
(?s)(<lodDistances content="float_array">\s*).*?(?=\s*</lodDistances>)
And replace with ${1}<new values>. The curly braces are necessary as the new values are most likely numbers and without the braces, $1n (n stands for a digit here) will be parsed incorrectly (see this YT video for a demo of what it is fraught with).
See the demo below:
V
Regex details:
(?s) - now, . matches line break chars, too
(<lodDistances content="float_array">\s*) - Group 1 capturing <lodDistances content="float_array"> text and then zero or more whitespaces
.*? - any zero or more chars, but as few as possible
(?=\s*</lodDistances>) - a positive lookahead that matches the location that is immediately followed with zero or more whitespaces and </lodDistances> text.
Note that / is not a special regex metacharacter, and since regex delimiter notation is not supported in Sublime Text, you do not have to ever escape it here.