Regex Email validation with some special cases [duplicate] - regex

I am trying to make a regex match which is discarding the lookahead completely.
\w+([-+.]\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*
This is the match and this is my regex101 test.
But when an email starts with - or _ or . it should not match it completely, not just remove the initial symbols. Any ideas are welcome, I've been searching for the past half an hour, but can't figure out how to drop the entire email when it starts with those symbols.

You can use the word boundary near # with a negative lookbehind to check if we are at the beginning of a string or right after a whitespace, then check if the 1st symbol is not inside the unwanted class [^\s\-_.]:
(?<=^|\s)[^\s\-_.]\w*(?:[-+.]\w+)*\b#\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*
See demo
List of matches:
support#github.com
s.miller#mit.edu
j.hopking#york.ac.uk
steve.parker#soft.de
info#company-hotels.org
kiki#hotmail.co.uk
no-reply#github.com
s.peterson#mail.uu.net
info-bg#software-software.software.academy
Additional notes on usage and alternative notation
Note that it is best practice to use as few escaped chars as possible in the regex, so, the [^\s\-_.] can be written as [^\s_.-], with the hyphen at the end of the character class still denoting a literal hyphen, not a range. Also, if you plan to use the pattern in other regex engines, you might find difficulties with the alternation in the lookbehind, and then you can replace (?<=\s|^) with the equivalent (?<!\S). See this regex:
(?<!\S)[^\s_.-]\w*(?:[-+.]\w+)*\b#\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*
And last but not least, if you need to use it in JavaScript or other languages not supporting lookarounds, replace the (?<!\S)/(?<=\s|^) with a (non)capturing group (\s|^), wrap the whole email pattern part with another set of capturing parentheses and use the language means to grab Group 1 contents:
(\s|^)([^\s_.-]\w*(?:[-+.]\w+)*\b#\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*)
See the regex demo.

I use this for multiple email addresses, separate with ‘;':
([A-Za-z0-9._%-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4};)*
For a single mail:
[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}

Related

Regular expression - skip characters in jMeter testing

I have the below regular expression which retrieves me all characters begins with
(state%3)((?:(?!#).)*)
I want to ignore the state%3. I have tried all kinds of lookback but nothing seems to work
Here is the full text that I need to match agains
"state%3DnGl%252BlPm8CkHfYd2PpBq7W0H2z6xgUeICgB7KFmGmGG8cTSQTf%252B9cYCfFSsT5YSPTITdbaLAlJoQ22%252FCXRAu3ROqTQYzpPfGYxKmRZ7iIqwx3g0GLpVkaXq5FL3Js5FcTGpncQx7TA9w1A6HsSyxxcktfwX8QSzhqJQj5lntOolrPoIqpa4l2C%252BbhCWuAOY18BwVynMv8%252BuSl#login/"
A couple of things I have already tried
^.{5}\Kstate
But seems not working. Any ideas. I need this to retrieve for jMeter testing.
No need of lookbehind, nor any lookarounds at all. Use a single capturing group and a negated character class:
state%3([^#]+)
AND set the template value to $1$.
See the regex demo. Details:
state%3 - matches a literal text
([^#]+) - Capturing group #1 (that is why template should be $1$): one or more chars other than #.

Regex validate only the result of a captured group

I have this regex to detect an email address:
(?=.*[a-zA-Z])([a-zA-Z0-9_.+-]{8,})#(\S+\.\S+)
The requirement: The part before # needs to contain at least one letter and be at least 8 characters long.
I'm using positive lookahead to see if it contains a letter, but lookahead actually apply to the entire line (the part after # usually will contain letters), so this will pass
123456789#gmail.com
So question is, how can I validate only the result of the first capturing group (in this case 123456789) to see if it has a letter or not?
The [a-zA-Z0-9_.+-]{8,} consuming pattern part before # does not match #, so the lookahead check should only check for a letter after 0 or more chars other than #.
Using
(?=[^#]*[a-zA-Z])([a-zA-Z0-9_.+-]{8,})#(\S+\.\S+)
will fix the issue. See the regex demo and a Regulex graph:
You may further optimize the lookahead pattern by precising the [^#]. E.g. since you only allow 0-9_.+- apart from letters, you may write the regex as
(?=[0-9_.+-]*[a-zA-Z])([a-zA-Z0-9_.+-]{8,})#(\S+\.\S+)
^^^^^^^^^
See this regex demo.
Or, you may follow the principle of contrast (suggested in comments), and use [^#a-zA-Z]* instead of [^#]*.
Depending on where you are using the regex, you might want to wrap it with ^ and $ anchors to ensure a full string match.

Name validation - Adding a check to this regex to stop entering just identical characters

I'm trying to add another feature to a regex which is trying to validate names (first or last).
At the moment it looks like this:
/^(?!^mr$|^mrs$|^ms$|^miss$|^dr$|^mr-mrs$)([a-z][a-z'-]{1,})$/i
https://regex101.com/r/pQ1tP2/1
The idea is to do the following
Don't allow just adding a title like Mr, Mrs etc
Ensure the first character is a letter
Ensure subsequent characters are either letters, hyphens or apostrophes
Minimum of two characters
I have managed to get this far (shockingly I find regex so confusing lol).
It matches things like O'Brian or Anne-Marie etc and is doing a pretty good job.
My next additions I've struggled with though! trying to add additional features to the regex to not match on the following:
Just entering the same characters i.e. aaa bbbbb etc
Thanks :)
I'd add another negative lookahead alternative matching against ^(.)\1*$, that is, any character, repetead until the end of the string.
Included as is in your regex, it would make that :
/^(?!^mr$|^mrs$|^ms$|^miss$|^dr$|^mr-mrs$|^(.)\1*$)([a-z][a-z'-]{1,})$/i
However, I would probably simplify your negative lookahead as follows :
/^(?!(mr|ms|miss|dr|mr-mrs|(.)\2*)$)([a-z][a-z'-]{1,})$/i
The modifications are as follow :
We're evaluating the lookahead at the start of the string, as indicated by the ^ preceding it : no need to repeat that we match the start of the string in its clauses
Each alternative match the end of the string. We can put the alternatives in a group, which will be followed by the end-of-string anchor
We have created a new group, which we have to take into account in our back-reference : to reference the same group, it now must address \2 rather than \1. An alternative in certain regex flavours would have been to use a non-capturing group (?:...)

Mixing Lookahead and Lookbehind in 1 Regexp

I'm trying to match first occurrence of window.location.replace("http://stackoverflow.com") in some HTML string.
Especially I want to capture the URL of the first window.location.replace entry in whole HTML string.
So for capturing URL I formulated this 2 rules:
it should be after this string: window.location.redirect("
it should be before this string ")
To achieve it I think I need to use lookbehind (for 1st rule) and lookahead (for 2nd rule).
I end up with this Regex:
.+(?<=window\.location\.redirect\(\"?=\"\))
It doesn't work. I'm not even sure that it legal to mix both rules like I did.
Can you please help me with translating my rules to Regex? Other ways of doing this (without lookahead(behind)) also appreciated.
The pattern you wrote is really not the one you need as it matches something very different from what you expect: text window.location.redirect("=") in text window.location.redirect("=") something. And it will only work in PCRE/Python if you remove the ? from before \" (as lookbehinds should be fixed-width in PCRE). It will work with ? in .NET regex.
If it is JS, you just cannot use a lookbehind as its regex engine does not support them.
Instead, use a capturing group around the unknown part you want to get:
/window\.location\.redirect\("([^"]*)"\)/
or
/window\.location\.redirect\("(.*?)"\)/
See the regex demo
No /g modifier will allow matching just one, first occurrence. Access the value you need inside Group 1.
The ([^"]*) captures 0+ characters other than a double quote (URLs you need should not have it). If these URLs you have contain a ", you should use the second approach as (.*?) will match any 0+ characters other than a newline up to the first ").

Need help with Regular Expression to Match Blood Group

I'm trying to come up with a regex that helps me validate a Blood Group field - which should accept only A[+-], B[+-], AB[+-] and O[+-].
Here's the regex I came up with (and tested using Regex Tester):
[A|B|AB|O][\+|\-]
Now this pattern successfully matches A,B,O[+-] but fails against AB[+-].
Can anyone please suggest a regex that'll serve my purpose?
Thanks,
m^e
Try:
(A|B|AB|O)[+-]
Using square brackets defines a character class, which can only be a single character. The parentheses create a grouping which allows it to do what you want. You also don't need to escape the +- in the character class, as they don't have their regexy meaning inside of it.
As you mentioned in the comments, if it is a string you want to match against that has the exact values you are looking for, you might want to do this:
^(A|B|AB|O)[+-]$
Without the start of string and end of string anchors, things like "helloAB+asdads" would match.
The brackets [] denote a character class, meaning "any of the characters herein". You want the parentheses () for grouping:
(A|B|AB|0)(\+|-)
When you are building an alternation (e.g. (A|B|AB|O)), you should be careful with the ordering of the elements. Many regex engines will stop at the first alternate that matches (rather than the longest). If it weren't for the [-+] forcing a backtrack, (A|B|AB|O)[-+] would not work for "AB+". It is probably better to say (AB|A|B|O)[-+] (but you should check the docs for your regex engine).
Also, if you do not intend to capture the antigen for latter use, you should you use the non-capturing grouping parentheses: (?:AB|A|B|O)[-+].
Furthermore, if you want to ensure that the only thing in the string is a blood type then you need anchors to prevent it from matching only part of the string: ^(?:AB|A|B|O)[-+]$. A quick note on anchors, Depending on your regex engine, ^ may match the beginning of a line rather than the beginning of the string if you pass it a multiline-match option. Similarly, $ may match the end of a line rather than the end of a string. For this reason there are three other anchors in common (but not %100) usage: \A, \Z, and \z. If your regex engine supports them, \A always matches the start of the string, \Z matches the end of the string or a newline just before the end of the string, and \z matches only the send of the string.
For case insensitive within html pattern attribute you may try this
([AaBbOo]|[Aa][Bb])[\+-]
<input type="text" maxlength="3" pattern="([AaBbOo]|[Aa][Bb])[\+-]" required />
^(A|B|AB|O)[+-]?$
This will produce the correct out put.