Custom email validation regex pattern not working properly - regex

So I've got /.+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(.{1})\w{2,}/ pattern I want to use for email validation on client-side, which doesn't work as expected.
I know that my pattern is simple and doesn't cover every standard possibility, but it's part of my regex training.
Local part of address should be valid only when it has at least one digit [0-9] or letter [a-zA-Z] and can be mixed with comma or plus sign or underscore (or all at once) and then # sign, then domain part, but no IP address literals, only domain names with at least one letter or digit, followed by one dot and at least two letters or two digits.
In test string form it doesn't validate a#b.com and does validate baz_bar.test+private#e-mail-testing-service..com, which is wrong - it should be vice versa - validate a#b.com and not validate baz_bar.test+private#e-mail-testing-service..com
What specific error I've got there and where?
I can't locate this, sorry..

You need to change your regex
From: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(\.{1})\w{2,}
To: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]?\#[\w+-]+(\.{1})\w{2,}
Notice that I added a ? before the # sign and removed the ? from the first "group" after the # sign. Adding that ? will make your regex to know that hole "group" is not mandatory.
See it working here: https://regex101.com/r/iX5zB5/2

You're requiring the local part (before #) to be at least two characters with the .+ followed by the character class [^...]. It's looking for any character followed by another character not in the list of exclusions you specify. That explains why "a#b.com" doesn't match.
The second problem is partly caused by the character class range +-? which includes the . character. I think you wanted [-\w+?]+. (Do you really want question marks?) And then later I think you wanted to look for a literal . character but it really ends up matching the first character that didn't match the previous block.
Between the regex provided and the explanatory text I'm not sure what rules you intend to implement though. And since this is an exercise it's probably better to just give hints anyway.
You will also want to use the ^ and $ anchors to makes sure the entire string matches.

Related

Regex number OR a symbol

I'm trying to create a regex that will meet the following requirements for a password.
Must have at least 1 uppercase
Must have at least 1 lowercase
Must contain a number OR a symbol - FAILS
Must be between 8 to 16 characters long
^(?=.*\d|[!##\$%\^&])(?=.*[a-z])(?=.*[A-Z]).{8,16}$
I've got it working, well almost, except the OR part.
It verifies for instance Tester01 and Tester0% but it wont verify Tester%$ or anything with two symbols, just in case the user doesn't put in a number. I've also tried putting brackets around the \d thinking I had to separate the digits from the symbols but that didn't work.
Your alternation condition isn't correct. Instead you can just slide the \d within the special characters bracket and change your regex to this,
^(?=.*[\d!##\$%\^&])(?=.*[a-z])(?=.*[A-Z]).{8,16}$
Now your this look ahead (?=.*[\d!##\$%\^&]) behaves exactly as you wanted. It will ensure that either one character is any digit or the other special characters mentioned in your character class.
Demo
The reason why your look ahead (?=.*\d|[!##\$%\^&]) fails is because your first alternation condition is .*\d and second is merely [!##\$%\^&] where as if correctly written it should have been either this,
(?=.*\d|.*[!##\$%\^&])
OR
(?=.*(\d|[!##\$%\^&]))
And you really don't need alternation at all if you write it like the way I have written above, where you can just put \d within the character set itself, like this,
(?=.*([\d!##\$%\^&]))
Use the principle of contrast with multiple lookaheads.
^
(?=[^A-Z]*[A-Z])
(?=[^a-z]*[a-z])
(?=[^\d!##\$%\^&]*[^\d!##\$%\^&])
.{8,16}
$
But please read this post as well (why password validations are bad?) and see a demo on regex101.com.

Regex taking too many characters

I need some help with building up my regex.
What I am trying to do is match a specific part of text with unpredictable parts in between the fixed words. An example is the sentence one gets when replying to an email:
On date at time person name has written:
The cursive parts are variable, might contains spaces or a new line might start from this point.
To get this, I built up my regex as such: On[\s\S]+?at[\s\S]+?person[\s\S]+?has written:
Basically, the [\s\S]+? is supposed to fill in any letter, number, space or break/new line as I am unable to predict what could be between the fixed words tha I am sure will always be there.
Now comes the hard part, when I would add the word "On" somewhere in the text above the sentence that I want to match, the regex now matches a much bigger text than I want. This is due to the use of [\s\S]+.
How am I able to make my regex match as less characters as possible? Using "?" before the "+" to make it lazy does not help.
Example is here with words "From - This - Point - Everything:". Cases are ignored.
Correct: https://regexr.com/3jdek.
Wrong because of added "From": https://regexr.com/3jdfc
The regex is to be used in VB.NET
A more real life, with html tags, can be found here. Here, I avoided using [\s\S]+? or (.+)?(\r)?(\n)?(.+?)
Correct: https://regexr.com/3jdd1
Wrong: https://regexr.com/3jdfu after adding certain parts of the regex in the text above. Although, in html, barely possible to occur as the user would never write the matching tag himself, I do want to make sure my regex is correctjust in case
These things are certain: I know with what the part of text starts, no matter where in respect to the entire text, I know with what the part of text ends, and there are specific fixed words that might make the regex more reliable, but they can be ommitted. Any text below the searched part is also allowed to be matched, but no text above may be matched at all
Another example where it goes wrong: https://regexr.com/3jdli. Basically, I have less to go with in this text, so the regex has less tokens to work with. Adding just the first < already makes the regex take too much.
From my own experience, most problems are avoided when making sure I do not use any [\s\S]+? before I did a (\r)?(\n)? first
[\s\S] matches all character because of union of two complementary sets, it is like . with special option /s (dot matches newlines). and regex are greedy by default so the largest match will be returned.
Following correct link, the token just after the shortest match must be geschreven, so another way to write without using lazy expansion, which is more flexible is to prepend the repeated chracter set by a negative lookahead inside loop,
so
<blockquote type="cite" [^>]+?>[^O]+?Op[^h]+?heeft(.+?(?=geschreven))geschreven:
becomes
<blockquote type="cite" [^>]+?>[^O]+?Op[^h]+?heeft((?:(?!geschreven).)+)geschreven:
(?: ) is for non capturing the group which just encapsulates the negative lookahead and the . (which can be replaced by [\s\S])
(?! ) inside is the negative lookahead which ensures current position before next character is not the beginning of end token.
Following comments it can be explicitly mentioned what should not appear in repeating sequence :
From(?:(?!this)[\s\S])+this(?:(?!point)[\s\S])+point(?:(?!everything)[\s\S])+everything:
or
From(?:(?!From|this)[\s\S])+this(?:(?!point)[\s\S])+point(?:(?!everything)[\s\S])+everything:
or
From(?:(?!From|this)[\s\S])+this(?:(?!this|point)[\s\S])+point(?:(?!everything)[\s\S])+everything:
to understand what the technic (?:(?!tokens)[\s\S])+ does.
in the first this can't appear between From and this
in the second From or this can't appear between From and this
in the third this or point can't appear between this and point
etc.

Regular Expression for Password strength with one special characters except Underscore

I have the following regular expression:
^.*(?=^.{8,}$)(?=.*\d)(?=.*[!##$%^&*-])(?=.*[A-Z])(?=.*[a-z]).*$
I am using it to validate for
At least one letter
least one capital letter
least one number
least one special characters
least 8 characters
But along with this I need to restrict the underscore (_).
If I enter password Pa$sw0rd, this is validating correctly, which is true.
If I enter Pa$_sw0rd this is also validating correctly, which is wrong.
The thing is the regex is passing when all the rules are satisfied. I want a rule to restrict underscore along with above.
Any help will be very appreciable.
I think you can use a negated character class [^_]* to add this restriction (also, remove the initial .*, it is redundant, and the first look-ahead is already at the beginning of the pattern, no need to duplicate ^, and it is totally redundant since the total length limit can be checked at the end):
^(?=.*\d)(?=.*[!##$%^&*-])(?=.*[A-Z])(?=.*[a-z])[^_]{8,}$
See demo
^(?=.*?\d)(?=.*?[!##$%^&*-])(?=.*?[A-Z])(?=.*?[a-z])(?!.*_).{8,}$
You can try this..* at start is of no use.See demo.
https://regex101.com/r/pG1kU1/34

regex to match first instance of a word but only when preceeded by match from another pattern

I've found some info on finding the first instance of a word in a string, but I'm trying to find the first instance of a word (two, actually, but in separate calls) only when it is preceded by some very specific text (an IP address delimited by underscores) that varies slightly. Also, these words are separated by underscores, so for some reason \b isn't working for me.
Here's some example strings to test against one line at a time. Only bolded words should be matched.
192_168_10_2_card02_port01_other_text_with_card_or_port
10_22_1_200_card4_port5_another_string_with_port_or_card
something_else_with_card_or_port_in_it
And in a second call, I'd like to match a different word in these strings.
192_168_10_2_card02_port01_other_text_with_card_or_port
10_22_1_200_card4_port5_another_string_with_port_or_card
something_else_with_card_or_port_in_it
My regex flavor is POSIX regex (for PostgreSQL 9.4). I've been able to run with anything that works in here http://regexpal.com/ so far.
Even if it can't solve for all 3 examples at once, if it could just solve for the first two, that would be very helpful.
Edit: To be absolutely clear, my intent is to replace the first string 'card' with the character 'c' and then to replace the first string 'port' with the letter 'p' without affecting any instance of 'card' or 'port' that are not immediately followed by numbers. This is why my match needs to include just those first words without their corresponding numbers.
If you can use negative lookahead you can use card((?!port).)*port to match a string with card, than any number of characters not followed by port, then card again.
EDIT:
if the input is always in the same format, then you can be more specific by using card[0-9]{1,2}_port. This will keep it from matching any other extraneous instances of card and port
EDIT2:
to match only the word in the first case you can use a positive lookahead: card(?=[0-9]{1,2}_port). Im not sure if your flavor allows positive lookbehind (the tester doesnt, but that is in js), but give (?<=card[0-9]{1,2}_)port a shot. If positive lookbehind doesnt work you may need to look into alternatives.
The \b assertion is not working in this case because _ is considered a word character.
Demo
You can use a look behind:
(?<=_)(card).*?(?<=_)(port)
Demo
To be even more specific, use the IP address pattern:
(^(?:\d+_){4})(card\d+)_(port\d+)
Demo
I had to solve this in two steps. In the first, I matched only lines with the IP string in the beginning (this excludes lines like my 3rd example). In the second step, I used regexp_replace to replace the first match of each word.
Unfortunately, I had completely missed the fact that regexp_replace only replaces the first match unless told otherwise with the 'g' flag:
WHEN (SELECT regexp_matches(mystring, '^1(?:[0-9]{1,3}_){4}card[0-9]{1,2}_port[0-9]{1,2}')) IS NOT NULL
THEN regexp_replace(regexp_replace(mystring, 'card', 'c'), 'port', 'p')
Though I still wish I could figure out how to match one of those words in a single expression, and I would accept any answer that could achieve that.

Regular expression for alphanumeric and underscore c#

I am working with ASP.NET MVC 5 application in which I want to add dataannotation validation for Name field.
That should accept any combination of number,character and under score only.
I tried by this but not working :
RegularExpression("([a-zA-Z0-9_ .&'-]+)", ErrorMessage = "Invalid.")]
Try this regex written under the regexr.com site.
Criteria - alphanumeric,underscore and space.
http://regexr.com/3agii
([a-zA-Z0-9_\s]+)
You are using a character class, that is the thing between the square brackets ([a-zA-Z0-9_ .&'-]). Within that square brackets you can define all characters that should be matched by this class. So, now it is easy: you allow characters you don't want to match.
Based on your "try" you could change this to
[a-zA-Z0-9_]
that seem to be the characters you want to match. But is it really what you need? Are that really the only characters that are possible for that field?
If yes then you are done.
If no, you probably want to add all characters of all languages. Luckily there is a Unicode property for that:
\p{L} All letter characters
There is another predefined group that could be useful for you:
\w matches any word character (The definition can also be found in the first link, includes the Unicode categories Ll,Lu,Lt,Lo,Lm,Nd,Pc, that is basically [a-zA-Z0-9_] but Unicode style with all letters and more connecting characters)
But still, if you want to match real names this will not cover all possible names. I have another answer on this topic here