REGEX rule to validate a domain field

REGEX rule to validate a domain field - regex

For one of the products I offer it is only available to people with certain domain extensions.
On the order form there is a field for them to enter their domain, and the system I am using does allow me to validate that field before continuing the order process.
I can add a 'Validation REGEX' to be run on the value entered in the domain field.
The TDLs that are supported are: .com, .net, .org, .biz, .info, .name, .tv, .cc, .me, .pro, .mobi, .cm, .co, .com.co, .nom.co, .net.co, .ws
I am trying to find out what REGEX validation to use to determine if the domain entered in the field matches one of those TLDs.
I can't change any of the code for this task. I just have a field to enter the REGEX validation rule.
I appreciate any ideas or suggestions you may have.

If it's just a domain they enter, use
\.(com|net|org|biz|info|name|tv|cc|me|pro|mobi|cm|co|ws)$
This matches a domain ending in a point followed by one of the TLD's you specified.
Since you're already allowing .co as a TLD, there's no need to check for com.co, nom.co, or net.co; they're valid since they end in .co.

([^\s]+(\.(?i)(com|net|org|biz|info|name|tv|cc|me|pro|mobi|cm|co|nom|ws|com.co|nom.co|net.co))$)

This should work for:
Extendings only:
^\.(?i)(net.co|nom.co|com.co|com|net|org|biz|info|name|tv|cc|me|pro|mobi|co|cm|ws)$
name.extedning:
\.(?i)(net.co|nom.co|com.co|com|net|org|biz|info|name|tv|cc|me|pro|mobi|co|cm|ws)$

Related

Validate domain with regex but not subdomain

I couldn't find anywhere a regex that could validate a domain but not accepting subdomains.
I found a lot of rules that validates domains but unfortunately all of them also validates subdomains.
Anyone have tips on this?
I have this regex that is almost what I need:
/(?!www\.)(?=^.{5,254}$)(^(?:(?!\d+\.)[a-z0-9\-]{1,63}\.){1,2}(?:[a-z]{2,})$)/
If I use a subdomain like test.domain.com.br, it validates good (rejecting it), but test.domain.com don't.

I couldn't find anywhere a regex that could validate a domain but not accepting subdomains.
Because no regex can do that for you (and anyone pretending the opposite just doesn't understand the DNS).
Which is exactly why you found out that:
a lot of rules that validates domains but unfortunately all of them also validates subdomains.
Because a "subdomain" is just a domain seen differently (or you can say that any domain is also a subdomain of another domain, except for root and TLD). This is all because the DNS is a tree.
You can use the definition given in https://www.rfc-editor.org/rfc/rfc8499:
Subdomain: "A domain is a subdomain of another domain if it is
contained within that domain. This relationship can be tested by
seeing if the subdomain's name ends with the containing domain's
name." (Quoted from [RFC1034], Section 3.1) For example, in the
host name "nnn.mmm.example.com", both "mmm.example.com" and
"nnn.mmm.example.com" are subdomains of "example.com". Note that
the comparisons here are done on whole labels; that is,
"ooo.example.com" is not a subdomain of "oo.example.com".
You can not find administrative boundaries given an hostname by just looking at it. You need either to do DNS live queries to find the delegation points OR you need to use something like the Public Suffix List maintained by Mozilla. Both cases have drawbacks that can be or not a problem depending on your use case.
If you are not convinced, here is some list of valid hostnames (you can use them in an URL and it will work), and try to find out how a regex could have helped you by being right in all cases:
dk
www.sante.gouv.fr
www.com.com
www.nominet.co.uk
www.uk.com
www.walton.k12.fl.us
lagazettedesancetres.blogspot.fr
www.al.ma.leg.br
ab.m.wikibooks.nom.nu
1512f1.станок.спб.рус
You can obviously find shortcuts where a regex will still be wrong but good enough, if you restrict the cases you need to act on. Otherwise, if you need to stay generic and potentially work in any TLD, then, sorry, no regex will solve your problem.
Also your regex is wrong in multiple other cases. For example it won't handle IDN TLDs, that do exist, as they will be like xn--something in ASCII form which won't be accepted by [a-z]{2,}
BTW, useful terminologies I suggest using which may often be clearer than domain/subdomain, as taken from https://url.spec.whatwg.org/#host-miscellaneous
"A host’s public suffix is the portion of a host which is included on the Public Suffix List."
"A host’s registrable domain is a domain formed by the most specific public suffix, along with the domain label immediately preceding it, if any."
I think what you are searching is the "registrable domain" part of any given string (and as you can see from the algorithm given at above URL, you can't do that without finding first the public suffix, which you can't do without using an external resource, the information is NOT self contained in the string).

How to write a conditional in regex

I have the follow line of regex (javascript)
/^[a-z0-9_.\-]+#(yahoo|gmail|excite})\.com$/
However, I am unsure of how to make this include subdomains (IF one is present).
So this expression should match uk.yahoo.com and yahoo.com email address as well... How can this be done?

Well, if you want just the subdomain uk.yahoo.com:
/^[a-z0-9_.\-]+#((?:uk\.)?yahoo|gmail|excite)\.com$/
The addition of (?:uk\.)? specifies a optional noncapturing group that matches either 0 or 1 occurrence of the pattern "uk.".
However, using regexes to validate email addresses is an awful idea. RFC2822 is a very complex standard. It's much better to blindly send an email to whatever minimally-validated address the user enters, fail early, and give the user a chance to correct the mistake.

Creating filters for Google Analytics to remove spam

I have successfully managed to filter out hits from certain spammy sites from Google Analytics. It's an ongoing battle, as new sites are popping up all the time and polluting my acquisition/referral results.
At present, the following match is used by the GA filter to stop all the sites below showing up in the data:
.*(best\-seo\-solution|semalt|buttons\-for\-website|social\-buttons|best\-seo\-offer|Get\-Free\-Traffic\-Now|buttons\-for\-your\-website|free\-share\-buttons)\.com.*
I've added most of these myself and it works however I now need to create a pattern that allows me to input URLs that aren't a standard something.com pattern. E.g:
site4.free-share-buttons.com
site5.free-share-buttons.com
So in these cases the end is always the same but the start can be variable.
buy-cheap-online.info
In this case it ends with .info
www.event-tracking.com
This one uses www. whereas others do not
http://webmaster-traffic.com
This one has the http:// as well.
And on top of all of that, the filter pattern can only be 255 maximum characters (but I can have more than one filter pattern) so I need to segment it up.
How can I create a regex filter pattern that would target all above URLs?

Google Analytics allows to create regex without having to escape all especial characters when the expression is simple. So you can write the expression without the backslashes \ and .* You can even remove the .com and the parenthesis since these names are very specific already
best-seo-solution|semalt|buttons-for-website|social-buttons|best-seo-offer|Get-Free-Traffic-Now|buttons-for-your-website|free-share-buttons|event-tracking|buy-cheap.info
If you happen to have a spam with a common name just add the full name |commonname.net for this specific case.
You can keep going until you reach 255 characters after that just add a second filter. This will work, but it has 3 downsides,
first there is 1 or 2 new spammers every week
second by the time you add it you already have some hits
third and this is a new behavior, some spam in now hitting with direct visits along with the referral and this won't be stopped by this filter.
To prevent this, I recommend you to use a valid hostname filter instead, this filter will only allow hits with one of your hostnames, and all ghost spam will be excluded since they use either a fake hostname or is not set.
Here you can find more information about referrer spam and the valid hostname filter
https://stackoverflow.com/a/28354319/3197362
http://www.ohow.co/things-you-must-know-about-spam-in-google-analytics/

Regex match website that is NOT an email

I'm trying to extract websites without matching email addresses.
In other words if my contact section has
email: a#gmail.com ---- website: www.company.com
I want the www.company.com without matching gmail.com.
So far I have tried everything that I can think of, the best I have so far is
\b(?:.(?<!#))+\.\S+\b
but that will still match gmail.com in a#gmail.com.
I'll admit that my Regex skills are not the strongest, I've done my research regarding negative lookaheads/behinds etc but I still don't know how to do this.

This is an expression made by JGSoft for domain names:
\b(?<!#)((?=[a-z0-9-]{1,63}\.)(xn--)?[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,63}\b
It is internationalized and strict.
I added (?<!#) to stop it from matching domain names after email names.
See a demo here

Regex password validation needs just one more adjustment

I have an expression that is close to what I need it's just missing my "no adjacent number" rule
^.(.).\1.*$
abcdef1 is allowed
abcdef1g2 is allowed
abcdef12 is NOT allowed (but my current expression allows this)
The password rules are:
Cannot have adjacent numbers
The same number cannot be repeated anywhere in the password
No repeating characters anywhere in the password
[edit]I am not sure what language it is using - I can tell you I am testing it with what looks like JavaScript (http://gskinner.com/RegExr/). I am using it in a windows application (Tools4Ever - E-SSOM) that is for Single Sign on

You can confirm that this does not match:
\d\d|(.).*(\1)
It may be better/easier to not use regex to do this validation though, as checking a unique character list is pretty easy to do. I'm also of the philosophy that you shouldn't put restrictions on what users want for their passwords.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

REGEX rule to validate a domain field - regex

([^\s]+(\.(?i)(com|net|org|biz|info|name|tv|cc|me|pro|mobi|cm|co|nom|ws|com.co|nom.co|net.co))$)

This should work for: Extendings only: ^\.(?i)(net.co|nom.co|com.co|com|net|org|biz|info|name|tv|cc|me|pro|mobi|co|cm|ws)$ name.extedning: \.(?i)(net.co|nom.co|com.co|com|net|org|biz|info|name|tv|cc|me|pro|mobi|co|cm|ws)$

Related

Validate domain with regex but not subdomain

How to write a conditional in regex

Creating filters for Google Analytics to remove spam

Regex match website that is NOT an email

Regex password validation needs just one more adjustment

Categories

Resources