Right now I have:
(?!.*([._-])\1)(?=.*#)[\w.#-]+
which finds test#foo
I want to make it so that test cannot start or end with a special character.
For instance, I want it to find:
tes-t#foo
test#-foo
but not:
-test#foo
test-#foo
-test-#foo
You can define a character class that doesn't include specific characters using ^, e.g. [^a] will match anything apart from an a.
I would split the regex that matches the pre-# word into three sections; one to match the leading character, one to match the middle, and one to match the last character. You'll also need to handle the special case of the pre-# word only having a single character.
This is not an area where you should be recreating the wheel: there’s too much to get wrong.
I’m not sure what you really want to do. Addresses like president#whitehouse.gov and a plain old postmaster are probably both deliverable but highly unlikely to do what you want.
The only reasonable way to validate a mail address is to send mail to that address and get back a non-automatable reply showing that it is the right human at the other end. But this cannot be done in real time. Which means the best you can do is make them type it twice to try to weed out typos. That’s not much help, really. That’s why signing up for something over the web always involves a negotiated handshake.
However, if all you need is to validate an RFC 5322–compliant address, you may use this pattern.
Just understand that testing an address for compliance with the RFC should never be confused with validating that mail address — which is something else altogether.
Related
there:
I want to create a filter in my email server that matches any message that contains any URL (using either http or https protocols) from a certain domain (let's say domain.org). I want it to match things like:
https://site1.domain.org
https://anothersite.domain.org
http://yetanotherone.domain.org
The problem here is that these strings can be wrapped in the message body at any random position of the string. And even worse, when the string is wrapped an equal sign is added before the end of the line, so I would need it to be able to match strings like these:
ht=
tps://thisisanexample.domain.org
https://thisisane=
xample.domain.org
https://thisisanexample.do=
main.org
I came up with a simple (but huge) solution, but I think there must be a much more elegant one than mine:
/h[=[:cntrl:]]*t[=[:cntrl:]]*t[=[:cntrl:]]*p[=[:cntrl:]]*s?[=[:cntrl:]]*:[=[:cntrl:]]*\/[=[:cntrl:]]*\/[=[:cntrl:]]*[-+_#&%$#|()=?¿:;,.,çÇ^[:cntrl:][:alnum:]\[\]\{\}\*\\]*[=[:cntrl:]]*.[=[:cntrl:]]*d[=[:cntrl:]]*o[=[:cntrl:]]*m[=[:cntrl:]]*a[=[:cntrl:]]*[=[:cntrl:]]*i[=[:cntrl:]]*n[=[:cntrl:]]*.[=[:cntrl:]]*o[=[:cntrl:]]*r[=[:cntrl:]]*g/
I have been looking around but I can not find anything that I understand to improve my solution given that my knowledge of regex does not go beyond simple queries.
Thank you very much in advance.
Regards.
2018/04/11 EDIT: Thank you to everyone who tried but the solutions proposed do not meet the requirements of elegance and readability I was expecting. I was looking for something like capturing everything but the equal-return string and performing the web address string search on the captured result of the first search. Is this a doable idea?
I'm using this regex to catch any incoming e-mails excluding mails from from specific people.
^(.(?!(zulgrib#exemple.com|zulgrib#example.org)).)*$/i
This regex correctly let through these scenarios
Zulgrib at example.com <Zulgrib#example.com>
<Zulgrib#example.com>
<Zulgrib#example.com> In behalf of Robot
Regex correctly catches these kind of headers
Associate#example.org
Your Associate Associate#example.com
If an excluded e-mail address is alone, it will catch it, I would like to prevent that. Example:
zulgrib#exemple.org
What should be modified to allow this to work and why my current method is not correct ?
If I understand the documentation, . matches any character, void is not a character, but using * is not working.
First, some issues in your current regex:
exemple has a different spelling than example
Literal points need to be escaped. So \.com instead of .com.
There are two dots (.) in the outermost group, which means you only capture text with an even number of characters, and don't exclude the case where the email addresses start at the beginning of the string. The first dot should not be there.
To make an exception for when the email address is the only thing in the input, I fear you'll have to specify that as a separate alternative in which (unfortunately) you'll have to repeat those email addresses:
^(?:zulgrib#example\.com|zulgrib#example\.org)$|^(?!(?:.*(?:zulgrib#example\.com|zulgrib#example\.org))).*$
Let's take an url like
www.url.com/some_thing/random_numbers_letters_everything_possible/set_of_random_characters_everything_possible.randomextension
If I want to capture "set_of_random_characters_everything_possible.randomextension" will [^/\n]+$work? (solution taken from Trying to get the last part of a URL with Regex)
My question is: what does the "\n" part mean (it works even without it)? And, is it secure if the url has the most casual combination of characters apart "/"?
First, please note that www.url.com/some_thing/random_numbers_letters_everything_possible/set_of_random_characters_everything_possible.randomextension is not a URL without a scheme like http:// in front of it.
Second, don't parse URLs yourself. What language are you using? You probably don't want to use a regex, but rather an existing module that has already been written, tested, and debugged.
If you're using PHP, you want the parse_url function.
If you're using Perl, you want the URI module.
Have a look at this explanation: http://regex101.com/r/jG2jN7
Basically what is going on here is "match any character besides slash and new line, infinite to 1 times". People insert \r\n into negated char classes because in some programs a negated character class will match anything besides what has been inserted into it. So [^/] would in that case match new lines.
For example, if there was a line break in your text, you would not get the data after the linebreak.
This is however not true in your case. You need to use the s-flag (PCRE_DOTALL) for this behavior.
TL;DR: You can leave it or remove it, it wont matter.
Ask away if anything is unclear or I've explained it a little sloppy.
Say you have an IP address: 74.125.45.100 so its A.B.C.D
Is there a way to use RegEx to get A,B,C separately?
If it is just to extract the numbers from the IP and not to validate the IP address then you could just do:
[0-9]
However, I think a simple String.Split(".") would be an easier option.
Something very simple yet ugly would work.. giving you four groups one for each octet.
(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})
([0-9]+).([0-9]+).([0-9]+).([0-9]+)
...should do it. It's no validating regex though, allows numbers beyond 255 for each part.
Here's a crazy validating one:
\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
Credit to last regex goes to RegexBuddy makers.
/(\d+)\.(\d+)\.(\d+)\.(\d+)/
First port of call for regex... RegEx Library
While others have pointed out various good regexps; May I ask why you absolutely must use regular expressions for that? It will be slow and error-prone. Most platforms do have integrated IP address functionality, or provide a way to call to inet_aton.
In case someone needs a validating RegEx for (all possible) IPv4 addresses:
([^\d.]|^)([01]{0,1}\d{1,2}|2[0-5][0-5])[.]([01]{0,1}\d{1,2}|2[0-5][0-5])[.]([01]{0,1}\d{1,2}|2[0-5][0-5])[.]([01]{0,1}\d{1,2}|2[0-5][0-5])([^\d]|$)
The IP is contained in 2nd, 3rd and 4th parameters. 1st and last are not used. Those are necessary otherwise a wrong IP like:
999.1.2.3
would be catched as "99.1.2.3". I am not sure if you want to allow IP ending with a dot, e.g.
1.2.3.4.
If not, change the last part to ([^\d.]|$). I do not allow any dots in front of it though.
I still think this RegEx is a messed monster :) and a better solution would be to validate by hand using a function.
I am working with legacy systems at the moment, and a lot of work involves breaking up delimited strings and testing against certain rules.
With this string, how could I return "Active" in a back reference and search terms, stopping when it hits the first caret (^)?:
Active^20080505^900^LT^100
Can it be done with an inclusion in the regex of this "(.+)" ? The reason I ask is that the actual regex "(.+)" is defined in a database as cutting up these messages and their associated rules can be set from a front-end system. The content could be anything ('Active' in this case), that's why ".+" has been used in this case.
Rule: The caret sign cannot feature between the brackets, as that would result with it being stored in the database field too, and it is defined elsewhere in another system field.
If you have a better suggestion than "(.+)" will be happy to hear it.
Thanks in advance.
(.+?)\^
Should grab up to the first ^
If you have to include (.+) w/o modifications you could use this:
(.+?)\^(.+)
The first backreference will still be the correct one and you can ignore the second.
A regex is really overkill here.
Just take the first n characters of the string where n is the position of the first caret.
Pseudo code:
InputString.Left(InputString.IndexOf("^"))
^([^\^]+)
That should work if your RE library doesn't support non-greediness.