Regex invalid on match - regex

I need regex to return invalid on a match. Specifically, the match is a string that starts with an A or an M and is followed by four numbers ie, A1223. The four numbers could be any random sequence.
I'm sure lookarounds are the way to handle this but I haven't grasped regex as a concept just yet. Thus far I've discovered how to capture the matched strings separate from other strings with the following.
([\s\S]*?)(A[\d][\d][\d][\d]|M[\d][\d][\d][\d])
Appreciate the help.

Regex doesn't really have match negation, but you can (ab)use a negative lookahead assertion to do inverted matching:
^((?!\s[AM]\d{4}).){6}

to match all strings not starting with A or M followed by 4 digits:
with negative lookahead:
^(?![AM]\d{4}).*
with consuming pattern using () capture groups:
[AM]\d{4}.*|(.+)

Related

Regex expression for [number2] in [number],[number2][word]

I'm trying to find a regular expression to find [number2] in [number],[number2][word].
So far I've tried with [,](\d*), but it also gets me the comma.
Demo: https://regexr.com/59eqa
You may use:
(?<=,)(\d*)
Regex Demo
Detail:
(?<=,): positive look behind that doesn't consume character but indicate that the number must have , before it
The previous answers do not handle the case that the second (or two numbers) is matched.
If the second number must be captured, this can be done with
\b\d+,(\d+)[A-Za-z]
where the "number2" is contained in captured group 1.
If you want to get the match only, you could use 2 lookarounds, asserting a comma to the left and a char a-zA-Z to the right.
Use \d+ to match 1 or more digits.
(?<=,)\d+(?=[a-zA-Z])
Regex demo
If there should be a digit before the comma as well:
(?<=\d,)\d+(?=[a-zA-Z])
Regex demo

Regex Negative Lookbehind Matches Lookbehind text .NET

Say I have the following strings:
PB-GD2185-11652-MTCH
GD2185-11652-MTCH
KD-GD2185-11652-MTCH
KD-GD2185-11652
I want REGEX.IsMatch to return true if the string has MTCH in it and does not start with PB.
I expected the regex to be the following:
^(?<!PB)\S+(?=MTCH)
but that gives me the following matches:
PB-GD2185-11652-
GD2185-11652-
KD-GD2185-11652-
I do not understand why the negative lookbehind not only doesn't exclude the match but includes the PB characters in the match. The positive lookahead works as expected.
EDIT 1
Let me start with a simpler example. The following regex matches all of the strings as I would expect it to:
\S+
The following regex still matches all of the strings even though I would expect it not to:
\S+(?!MTCH)
The following regex matches all but the final H character on the first three strings:
\S+(?<!MTCH)
From the documentation at regex 101, a lookahead looks for text to the right of the pattern and a lookbehind looks for text to the left of the pattern, so having a lookahead at the beginning of a string does not jive with the documentation.
Edit 2
take another example with the following three strings:
grey
greyhound
hound
the regex:
^(?<!grey)hound
only matches the final hound. whereas the regex:
^(?<!grey)\S+
matches all three.
You need a lookahead: ^(?!PB)\S+(?=MTCH). Using the look-behind means the PB has to come before the first character.
The problem was because of the greediness of \S+. When dealing with lookarounds and greedy quantifiers you can easily match more characters than you expect. One way to deal with this is to insert a negative lookaround in a group with the greedy quantifier to exclude it as a match as stated in this question:
How to non-greedy multiple lookbehind matches
and on this helpful website about greediness in regular expressions:
http://www.rexegg.com/regex-quantifiers.html
Note that this second link has a few other ways to deal with the greediness in various situations.
A good regular expression for this situation is as follows:
^(?<!PB)((?!PB)\S+)(MTCH)
In situations like this it is going to be much clearer to do it logically within the code. So first check if the string matches MTCH and then that it doesn't match ^PB

check if there is a word repeated at least 2 or more times. (Regular Expression)

Using Regular Expression,
from any line of input that has at least one word repeated two or more times.
Here is how far i got.
/(\b\w+\b).*\1
but it is wrong because it only checks for single char, not one word.
input: i might be ill
output: < i might be i>ll
<> marks the matched part.
so, i try to do (\b\w+\b)(\b\w+\b)*\1
but it is not working totally.
Can someone give help?
Thanks.
this should work
(\b\w+\b).*\b\1\b
greedy algorithm will ensure longest match. If you want second instance to be a separate word you have to add the boundaries there as well. So it's the same as
\b(\w+)\b.*\b\1\b
Positive lookahead is not a must here:
/\b([A-Za-z]+)\b[\s\S]*\b\1\b/g
EXPLANATION
\b([A-Za-z]+)\b # match any word
[\s\S]* # match any character (newline included) zero or more times
\b\1\b # word repeated
REGEX 101 DEMO
To check for repeated words you can use positive lookahead like this.
Regex: (\b[A-Za-z]+\b)(?=.*\b\1\b)
Explanation:
(\b[A-Za-z]+\b) will capture any word.
(?=.*\b\1\b) will lookahead if the word captured by group is present or not. If yes then a match is found.
Note:- This will produce repeated results because the word which is matched once will again be matched when regex pointer captures it as a word.
You will have to use programming to strip off the repeated results.
Regex101 Demo

Regex to match certain word but not a particular combination

I have 15 titles as follows:
fruits-and-flowers-themeA
fruits-and-flowers-themeB
fruits-and-flowers-just-test-themeA
themeAfruitsandflowers
nice-fruits-and-flowers-themeA
botanical-names-themeA
I want a regex to help me get only those titles with "themeA" in them, but it should not include "nice" and not include "just-test" or "just-tests".
I tried
^(?!.*just-test|*just-tests|nice).*?(?:themeA).*,
but I still get fruits-and-flowers-just-test-themeA in the output.
How to fix this?
Thanks
You can use this regex with negative lookahead:
^(?!.*?(?:just-tests?|nice)).*?themeA.*$
Working Demo
Option 1
You can use a single regex with lookaheads (see online demo):
^(?!.*nice?)(?!.*just-tests?).*themeA.*
The ^ asserts that the match starts at the beginning of the string (so we don't match a subset of the string
The (?!.*nice?) is a negative lookahead that asserts that at this position in the string, we cannot find any characters followed by nice
The (?!.*just-tests?) is a negative lookahead that asserts that at this position in the string, we cannot find any characters followed by just-test and an optional s
As a further tweak, you can compress the lookaheads into one using an | alternation as in anubhava's answer.
Option 2 without lookaheads (Perl, PHP/PCRE)
^(?:.*(?:nice|just-tests?).*)(*SKIP)(?!)|.*themeA.*
This one doesn't use lookaheads but just skips the unwanted titles. See demo.
Use two different regular expressions for clarity and simplicity.
Match your string against one regex that matches themeA:
/themeA/
and then check that the string does NOT match the one you don't want:
/nice|just-tests?/
Doing it in two different regexes makes it far easier to understand and maintain.

How to negate the whole regex?

I have a regex, for example (ma|(t){1}). It matches ma and t and doesn't match bla.
I want to negate the regex, thus it must match bla and not ma and t, by adding something to this regex. I know I can write bla, the actual regex is however more complex.
Use negative lookaround: (?!pattern)
Positive lookarounds can be used to assert that a pattern matches. Negative lookarounds is the opposite: it's used to assert that a pattern DOES NOT match. Some flavor supports assertions; some puts limitations on lookbehind, etc.
Links to regular-expressions.info
Lookahead and Lookbehind Zero-Width Assertions
Flavor comparison
See also
How do I convert CamelCase into human-readable names in Java?
Regex for all strings not containing a string?
A regex to match a substring that isn’t followed by a certain other substring.
More examples
These are attempts to come up with regex solutions to toy problems as exercises; they should be educational if you're trying to learn the various ways you can use lookarounds (nesting them, using them to capture, etc):
codingBat plusOut using regex
codingBat repeatEnd using regex
codingbat wordEnds using regex
Assuming you only want to disallow strings that match the regex completely (i.e., mmbla is okay, but mm isn't), this is what you want:
^(?!(?:m{2}|t)$).*$
(?!(?:m{2}|t)$) is a negative lookahead; it says "starting from the current position, the next few characters are not mm or t, followed by the end of the string." The start anchor (^) at the beginning ensures that the lookahead is applied at the beginning of the string. If that succeeds, the .* goes ahead and consumes the string.
FYI, if you're using Java's matches() method, you don't really need the the ^ and the final $, but they don't do any harm. The $ inside the lookahead is required, though.
\b(?=\w)(?!(ma|(t){1}))\b(\w*)
this is for the given regex.
the \b is to find word boundary.
the positive look ahead (?=\w) is here to avoid spaces.
the negative look ahead over the original regex is to prevent matches of it.
and finally the (\w*) is to catch all the words that are left.
the group that will hold the words is group 3.
the simple (?!pattern) will not work as any sub-string will match
the simple ^(?!(?:m{2}|t)$).*$ will not work as it's granularity is full lines
This regexp math your condition:
^.*(?<!ma|t)$
Look at how it works:
https://regex101.com/r/Ryg2FX/1
Apply this if you use laravel.
Laravel has a not_regex where field under validation must not match the given regular expression; uses the PHP preg_match function internally.
'email' => 'not_regex:/^.+$/i'