I am trying to extract email addresses that are in a text list that have no separator that could be used the detect beginning and ending of the addresses. I have a string like this:
email1#hotmail.comwelcome#mydomain.atinfo#another-domain.detesting#domain.or.atmy.name_test#domainname.de
I am on the following state of the regex (not working):
[a-zA-Z0-9.-]+#[a-zA-Z0-9-.]+.(com|at|de|or.at)
Would be very interesting if someone have a solution for this? Maybe also a better way to determine the domain ending than having a hardcoded list of all possibilities.
You're going to need that list of hardcoded TLDs, otherwise there's no way of determining where one address ends and where the next one begins.
Your regex is not bad, but you need to escape the . (otherwise it will match any character if not enclosed in a character class) and to allow underscores within your character classes:
[a-zA-Z0-9._-]+#[a-zA-Z0-9_.-]+\.(com|at|de|or\.at)
works for your examples.
Related
I'm really bad at regex and still learning. I'm trying to setup my regex to find my first URI below.
/test/guid/5824812d100afbc60ef09411
/test/guid/5824812d100afbc60ef09411/action/create
/test/guid/5824812d100afbc60ef09411/action/version/delete
I have my regex working for both the (/action/create) and (/action/version/delete).
I need the first to be it's own individual URI. The guid after /guid changes, but it never will contain anything after.
These are working:
\/test\/guid\/\d.*\/action\/create
\/test\/guid\/\d.*\/action\/version\/delete
However if I use the same convention to find the first URI, it finds them all. I need all 3 separate.
Help?
Anchors are your friend here. ^ matches the beginning of a line (or beginning of the full string, depending on your modifiers) and $ matches the end.
So all you need is something like this:
\/test\/guid\/[a-z0-9]+$
That should be good enough, since after the guid's string of alphanumeric characters you're expecting the string to either terminate or have a forward slash, but if your guid is of a known fixed length, it might be better to do something like:
\/test\/guid\/[a-z0-9]{24}$
Here is the regex101 demo.
I want to parse a list of email addresses separated by a variety of delimiters. The regex I am using is:
/(\S+?#\S+?\.\S+?)[,|;|\|\s|\n|\r|\t|\0|\b|$]/gmi
The problem is, in the example demo above, it doesn't pick up the last item in the list. How do I pick up the last email address in the list?
You can't use $ as a line/string terminator inside a character class, it will be understood as the literal dollar character : while /(\S+?#\S+?\.\S+?)[,|;|\|\s|\n|\r|\t|\0|\b|$]/gmi doesn't work, /(\S+?#\S+?\.\S+?)([,|;|\|\s|\n|\r|\t|\0|\b|]|$)/gmi does.
Additionally, I would suggest a number of improvements to your regex :
remove the pipes from the character class, unless you want to match a literal pipe
remove the NUL (\0) character from the character class. Not only should it never appear in your string, even if it did it would be matched by $
remove the linefeeds from your character class and/or stop using the m flag, unless a single address can be split in multiple lines
stop using the i flag, which won't affect the character classes you're using
I also doubt you want to match centralreservation#ramaya;nahotel.com as a valid address.
In conclusion, I suggest you use [^\s;,#]+#[^\s;,#]+\.[^\s;,#]+ instead, or better stop trying to validate email addresses with regex and instead use a specialized library. To understand why, check the regex this perl module uses to validate emails. And it doesn't even fully implement the RFC...
A big thanks to Sebastian Proske for his assistance.
So I've got /.+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(.{1})\w{2,}/ pattern I want to use for email validation on client-side, which doesn't work as expected.
I know that my pattern is simple and doesn't cover every standard possibility, but it's part of my regex training.
Local part of address should be valid only when it has at least one digit [0-9] or letter [a-zA-Z] and can be mixed with comma or plus sign or underscore (or all at once) and then # sign, then domain part, but no IP address literals, only domain names with at least one letter or digit, followed by one dot and at least two letters or two digits.
In test string form it doesn't validate a#b.com and does validate baz_bar.test+private#e-mail-testing-service..com, which is wrong - it should be vice versa - validate a#b.com and not validate baz_bar.test+private#e-mail-testing-service..com
What specific error I've got there and where?
I can't locate this, sorry..
You need to change your regex
From: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(\.{1})\w{2,}
To: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]?\#[\w+-]+(\.{1})\w{2,}
Notice that I added a ? before the # sign and removed the ? from the first "group" after the # sign. Adding that ? will make your regex to know that hole "group" is not mandatory.
See it working here: https://regex101.com/r/iX5zB5/2
You're requiring the local part (before #) to be at least two characters with the .+ followed by the character class [^...]. It's looking for any character followed by another character not in the list of exclusions you specify. That explains why "a#b.com" doesn't match.
The second problem is partly caused by the character class range +-? which includes the . character. I think you wanted [-\w+?]+. (Do you really want question marks?) And then later I think you wanted to look for a literal . character but it really ends up matching the first character that didn't match the previous block.
Between the regex provided and the explanatory text I'm not sure what rules you intend to implement though. And since this is an exercise it's probably better to just give hints anyway.
You will also want to use the ^ and $ anchors to makes sure the entire string matches.
I'm trying to create a regex that does not allow disposable email addresses but allows everything else. So far, here is what I have:
^[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(((?:[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?\.)+[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9]))(?=.*(?!.*mailinator.com))(?=.*(?!.*trbvm.com))(?=.*(?!.*guerrillamail.com))(?=.*(?!.*guerrillamailblock.com))(?=.*(?!.*sharklasers.com))(?=.*(?!.*guerrillamail.net))(?=.*(?!.*guerrillamail.org))(?=.*(?!.*guerrillamail.biz))(?=.*(?!.*spam4.me|grr.la))(?=.*(?!.*guerrillamail.de))(?=.*(?!.*grandmasmail.com))(?=.*(?!.*zetmail.com))(?=.*(?!.*vomoto.com))(?=.*(?!.*abyssmail.com))(?=.*(?!.*anappthat.com))(?=.*(?!.*eelmail.com))(?=.*(?!.*yopmail.com))(?=.*(?!.*fakeinbox.com)))$
Right now, it accepts all email addresses.
Try this slightly modified regex using lookbehind:
^[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(((?:[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?\.)+[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9]))(?<!mailinator\.com)(?<!trbvm\.com)(?<!guerrillamail\.com)(?<!guerrillamailblock\.com)(?<!sharklasers\.com)(?<!guerrillamail\.net)(?<!guerrillamail\.org)(?<!guerrillamail\.biz)(?<!spam4\.me)(?<!grr\.la)(?<!guerrillamail\.de)(?<!grandmasmail\.com)(?<!zetmail\.com)(?<!vomoto\.com)(?<!abyssmail\.com)(?<!anappthat\.com)(?<!eelmail\.com)(?<!yopmail\.com)(?<!fakeinbox\.com))$
It matches bob#gmail.com but does not match bob#mailinator.com.
Fundamentally, you had a regex to match any email address, followed by positive and negative lookaheads like (?=.*(?!.*mailinator.com)). By the time those lookaheads are executed, you're already at the end of the string (further enforced by the $).
Looking ahead from the end of the string there is… nothing. Any lookahead (positive or negative) into nothingness will either always pass, or always fail, regardless of the input string. E.g. a lookahead of (?=.*) at the end of a string will always pass (.* matches the empty string), whereas one of (?=.) will always fail (. does not match the empty string).
In your case, the lookaheads like (?=.*(?!.*mailinator.com)) are okay with the nothingness beyond the end of the input string, so always pass. It's identical to if you didn't have them in the regex at all.
The simple fix, without overhauling the regex entirely, is to look behind with the (?<!) construct, instead of ahead. You're at the end of the string, and want to ensure it didn't end with one of the disposable email domains you have listed. To do that for one domain, it would be (?<!mailinator\.com).
There are many disposable email domains and they are constantly changing. Writing a regex for them is only going to capture a small number and will require constant maintenance and updating.
You may want to look at using some open source lists eg. https://github.com/disposable/disposable and then build a way to update them.
Alternatively you can use something like Upollo's free tier which does this for you.
I want to match IP against my IP list which stored in arraylist but it is in this format
65.232.211.[001-175]
eg. 68.232.211.133 must be match
68.232.211.199 not match
I want regualr express for this scenario but I dont know how it would be..
I tried but not getting correct ans..
Please help me..
You could use something like so: 68\\.232\\.211\\.0*([1-9][0-9]?|1[0-6][0-9]|17[0-5]). The last part should match the numerical range you are after (courtesy of Regex_For_Range).
Since the period character in regex is a special character (denoting any character), it needs to be escaped. This is done by adding an extra slash, like so: \.. Since you are using C# (it seems) you need to escape the slash as well since that is a special character in the C# language.
You could, alternatively (and even better than the above) use the following regex to split the IP in 2 and do what ever validation you need: ^([\d.]+?)\.(\d+)$. This regex would yield 2 groups, so taking 68.232.211.133 as an example, it would yield 68.232.211 and 133.
The above will allow you to match the initial part of the IP as a string and it will then allow you to take the last section of the IP, change it to a numerical value and perform range checks using mathematical operator.
In my opinion, the second approach should be favoured since it is (in my opinion) easier to maintain.