Why /^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)|(edu)|(org)$/i does not work as expected - regex

I have this regex for email validation (assume only x#y.com, abc#defghi.org, something#anotherhting.edu are valid)
/^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)|(edu)|(org)$/i
But #abc.edu and abc#xyz.eduorg are both valid as to the regex above. Can anyone explain why that is?
My approach:
there should be at least one character or number before #
then there comes #
there should be at least one character or number after # and before .
the string should end with either edu, com, or org.

Try this
/^[a-zA-Z0-9]+#[a-zA-Z0-9]+\.(com|edu|org)$/i
and it should become clear - you need to group those alternatives, otherwise you can match any string that has 'edu' in it, or any string that ends with org. To put it another way, your version matches any of these patterns
^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)
(edu)
(org)$
It's worth pointing out that the original poster is using this as a regex learning exercise. This would be a terrible regex for actual production use! It's a thorny problem - see Using a regular expression to validate an email address for a lot more depth.

Your grouping parentheses are incorrect:
/^[a-zA-Z0-9]+#[a-zA-Z0-9]+\.(com|edu|org)$/i
Can also just use one case as you're using the i modifier:
/^[a-z0-9]+#[a-z0-9]+\.(com|edu|org)$/i
N.B. you were also missing a + from the second set, I assume this was just a typo...

What you have written is the equivalent of matching something that:
Begins with [a-zA-Z0-9]+#[a-zA-Z0-9].com
contains edu
or ends with org
What you were looking for was:
/^[a-z0-9]+#[a-z0-9]+\.(com|edu|org)$/i

Your regex looks ok.
I guess you are looking using a find function in stead of a match function
Without specifying what you use it is a bit difficult, but in Python you would write
import re
pattern = re.compile ('^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)|(edu)|(org)$')
re.match('#abc.edu') # fails, use this to validate an input
re.search('#abc.edu') # matches, finds the edu

Try to use it:
[a-zA-Z0-9]+#[a-zA-Z0-9]+.(com|edu|org)+$
U forget about + modificator if u want to catch any combinations of (com|edu|org)
Upd: as i see second [a-zA-Z0-9] u missed + too

Related

Regex - Match URLs except an isolated case

I have a regex pattern which was made to match attempts of URLs advertisement.
[a-zA-Z0-9\-\.]+\s?
(\.|\(\.\)|dot|\(dot\)|-|;|:|,)\s
(com|org|net|cz|co|uk|sk|biz|mobi|xxx|eu|me)\b
I also made this formula detect attempts of outsmarting the protection such as:
www-google-com or google-com (using '-' instead of '.')
The Problem
I got reported that, in the Portuguese language, words like
"ganhou-me" or "fugiu-me"
are valid and still getting caught by the protection. The hyphen is used together with "me" domain and causing the confusion.
I'm trying to find a way to exclude that particular case from the expression but:
Still be able to detect attempts like: google.me or google;me
But ignore attempts like: google-me or ganhou-me
I thought about removing the "me" from the main expression and add a disjunction that included that particular case, but that sounds like the hardest solution?
If you want all -me addresses to not be matched and your language supports negative Look-behind you could use [a-zA-Z0-9\-\.]+\s?(\.|\(\.\)|dot|\(dot\)|-|;|:|,)\s?(com|org|net|cz|co|uk|sk|biz|mobi|xxx|eu|(?<!-)me)\b or here is a Look-ahead version [a-zA-Z0-9\-\.]+\s?(\.|\(\.\)|dot|\(dot\)|-(?!me)|;|:|,)\s?(com|org|net|cz|co|uk|sk|biz|mobi|xxx|eu|me)\b.
This works by using (?<!-) to check there is a - before 'me' when matching in the first one or uses this -(?!me) to check there is an 'me' after the - in the second one.
Here is it working in a java-script example. Note- I used the second version as java-script does not support Look-behind.
var value = "www.google.com www.google;me www.google-me";
var matches = value.match(
new RegExp("[a-zA-Z0-9\\-\\.]+\s?(\\.|\\(\\.\\)|dot|\\(dot\\)|-(?!me)|;|:|,)\\s?(com|org|net|cz|co|uk|sk|biz|mobi|xxx|eu|me)\\b", "g")
);
document.writeln(matches);
Of course it might be better to use a white list (suggested in comments above) as this is very general.

Regex: Non fixed-width look around assertions?

My college asked my to provide him with a regex that only matches if the test-string endswith
.rar or .part1.rar or part01.rar or part001.rar (and so on).
Should match:
foo.part1.rar
xyz.part01.rar
archive.rar
part3_is_the_best.rar
Should not match:
foo.r61
bar.part03.rar
test.sfv
I immediately came up with the regex \.(part0*1\.)?rar$. But this does match for bar.part03.rar.
Next I tried to add a negative look behind assertion: .*(?<!part\d*)\.(part\0*1\.)?rar$ That didn't work either, because look around assertions need to be fixed width.
Then I tried using a regex-conditional. But that didn't work either.
So my question: Can this even be solved by using pure regex?
An answer should either contain a link to regex101.com providing a working solution, or explain why it can't work by using pure regex.
You could use lookahead to verify the one case that fails your original regex (.rar with .part part that isn't 0*1) is discredited:
^(?!.*\.part0*[^1]\.rar$).*\.(part0*1\.)?rar$
See it in action
This is an old question, but here's another approach:
(?:\.part0*1\.rar|^(?<!\.)\w+\.rar)$
The idea is to match either:
A string that ends with .part0*1.rar (ie foo.part01.rar, foo.part1.rar, bar.part001.rar), OR
A string that ends with .rar and doesn't contain any other dots (.) before that.
Works on all your test cases, plus your extra foo.part19.rar.
https://regex101.com/r/EyHhmo/2

Validating URL using regex

I am trying to validate a URL with just a scheme and domain name (something like http://www.domainname.com). I am using this regex:
/^(http|https):\/\/[\w.\-]+\.[A-Za-z]{2,6}/
When I type http://www.ab, up to 6 characters it returns true, after that length it return false. How can I tackle this situation?
You can use regex like this : https?:\/\/www\..*?\.(com|uk|in) (you have to specify what all you want to match at the end.
demo here
Try this one:
^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$
Test it here: https://regex101.com/r/xR0oV9/1
Let me correct a bit your pattern, just for information.
Instead of (http|https) much shorter would be (https?) because http part will be in both cases, and s is optional.
Instead of this: [A-Za-z] you can just use lower case letters: [a-z] and add i modifier to the end of your pattern (after last slash /) which would mean case insensitive match.
This one from diegoperini is maybe a little bit longer but therefore it's nearly perfect (atleast in my eyes).
_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS
If you want to use it in C# you have to slightly change it. I've done this already some time ago.
^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$

RegEx: Match Mr. Ms. etc in a "Title" Database field

I need to build a RegEx expression which gets its text strings from the Title field of my Database. I.e. the complete strings being searched are: Mr. or Ms. or Dr. or Sr. etc.
Unfortunately this field was a free field and anything could be written into it. e.g.: M. ; A ; CFO etc.
The expression needs to match on everything except: Mr. ; Ms. ; Dr. ; Sr. (NOTE: The list is a bit longer but for simplicity I keep it short.)
WHAT I HAVE TRIED SO FAR:
This is what I am using successfully on on another field:
^(?!(VIP)$).* (This will match every string except "VIP")
I rewrote that expression to look like this:
^(?!(Mr.|Ms.|Dr.|Sr.)$).*
Unfortunately this did not work. I assume this is because because of the "." (dot) is a reserved symbol in RegEx and needs special handling.
I also tried:
^(?!(Mr\.|Ms\.|Dr\.|Sr\.)$).*
But no luck as well.
I looked around in the forum and tested some other solutions but could not find any which works for me.
I would like to know how I can build my formula to search the complete (short) string and matches everything except "Mr." etc. Any help is appreciated!
Note: My Question might seem unusual and seems to have many open ends and possible errors. However the rest of my application is handling those open ends. Please trust me with this.
If you want your string simply to not start with one of those prefixes, then do this:
^(?!([MDS]r|Ms)\.).*$
The above simply ensures that the beginning of the string (^) is not followed by one of your listed prefixes. (You shouldn't even need the .*$ but this is in case you're using some engine that requires a complete match.)
If you want your string to not have those prefixes anywhere, then do:
^(.(?!([MDS]r|Ms)\.))*$
The above ensures that every character (.) is not followed by one of your listed prefixes, to the end (so the $ is necessary in this one).
I just read that your list of prefixes may be longer, so let me expand for you to add:
^(.(?!(Mr|Ms|Dr|Sr)\.))*$
You say entirely of the prefixes? Then just do this:
^(?!Mr|Ms|Dr|Sr)\.$
And if you want to make the dot conditional:
^(?!Mr|Ms|Dr|Sr)\.?$
^
Through this | we can define any number prefix pattern which we gonna match with string.
var pattern = /^(Mrs.|Mr.|Ms.|Dr.|Er.).?[A-z]$/;
var str = "Mrs.Panchal";
console.log(str.match(pattern));
this may do it
/(?!.*?(?:^|\W)(?:(?:Dr|Mr|Mrs|Ms|Sr|Jr)\.?|Miss|Phd|\+|&)(?:\W|$))^.*$/i
from that page I mentioned
Rather than trying to construct a regex that matches anything except Mr., Ms., etc., it would be easier (if your application allows it) to write a regex that matches only those strings:
/^(Mr|Ms|Dr|Sr)\.$/
and just swap the logic for handling matching vs non-matching strings.
re.sub(r'^([MmDdSs][RSrs]{1,2}|[Mm]iss)\.{0,1} ','',name)

Help to compose regular expression

I have folowing string: user1 fam <user#example.com>, user2 fam <user2#example.com>, ...
How can i get mail address from this string with regular expression. I need in output list of mail address
user#example.com
uesr2#example.com
I try:
<.*>
But it's ouput with < >:
<user#example.com>
<uesr2#example.com>
Thank you.
p.s. Thank you #xanatos for comment, I use Erlang
As the other have said, but to make it faster:
<([^>]*)>
In this way the Regex won't have to backtrack (with the other Regexes suggested, the Regex will match all the string and then will begin to rollback to find a >)
I'll add that, for historical reasons, there are small differences between the . and, for example [\s\S]. Both catch all the characters EXCEPT the \n. The first one (.) doesn't catch it. So by using the [^>] you are catching the \n, but this shouldn't be a problem for what you are doing. http://www.regular-expressions.info/dot.html
Just to be complete, because it's a problem that often happens, there is another variant:
<((?:(?!>).)*)>
(you can substitute the . with [\s\S] if you want, or use the SingleLine option if your language supports it, to make the . behave in a different way). The point here is that the "stop" expression can be longer than one character. Instead of (?!>) you could have inserted (?!%%) and it would have stopped at %%. BUT I'm not sure this variant work with Erlang (I hadn't noticed the new Tag... It wasn't there when I orginally read the question and I'm not an Erlang programmer... And it seems at least two Erlang programmers have different opinions on the argument :-) )
You need to use the option ungreedy so that it only matches the individual bracket pairs.
global so that you can get all the matches.
and you need {capture, all_but_first, list} so that you get the actual values (list can also be binary if you prefer binary results). all_but_first tells re to not return the whole match (which would include <>), just the group.
Result:
1> S.
"user1 fam <user#example.com>, user2 fam <user2#example.com>, "
2> re:run(S, "<(.+)>", [ungreedy, global, {capture, all_but_first, list}]).
{match,[["user#example.com"],["user2#example.com"]]}
Use groups. See your regex engine's documentation for more details.
>>> re.findall('<(.*?)>', 'user1 fam <user#example.com>, user2 fam <user2#example.com>, ...')
['user#example.com', 'user2#example.com']
Keep it simple and use <([^>]*)> which is about as fast as it can get and works for most versions of regular expressions. This is faster as it never has to backtrack while using <(.*?)> will cause backtracking.