How to use W3C Email Regex in Elixir - regex

I'm trying to use the following W3C Email Regex in Elixir (source: RegexPal):
/^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+#[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/
In a function that looks like this
def get_type(value) do
cond do
String.match?(value, ~r/^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+#[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/) ->
:email
String.match?(value, ~r/^\+[1-9][0-9]\d{1,14}$/) ->
:phone_number
end
end
But I'm getting a compilation error
unexpected token: "`" (column 56, code point U+0060)
What am I doing wrong here?

This is not particularly answering your question regarding regex, which was perfectly answered by #julp.
I am here to tell do not use regex to validate emails. The regular expression you’ve mentioned is not nearly correct by any means.
Here is an example of a perfectly valid email that won’t pass your regex: "John Smith"#example.com.
More.
So my guess would be: check if there is # or not.
def get_type(value) do
if String.contains?(value, "#"),
do: :email,
else: :phone_number
end

Sine you used / to delimit the content of your sigil, you need to escape the / character in your regexp. An other solution is to choose a delimiter that is not part of the regexp (like ' or < for example) to avoid the need of escaping.
~R/^[a-zA-Z0-9.!#$%&’*+\/=?^_`{|}~-]+#[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/
# or
~R<^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+#[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$>

One very practical way of avoiding this problem is to simply write the regex as a string and call Regex.compile! on it.
Regex.compile!("^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+#[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$")

Related

REGEX: How Do I Write a Negative REGEX for a HTML Pattern value?

I currently have a contact form where I reject values in text fields that have email addresses and URLs. I use these REGEX expressions in my Rails controller. I want to replicate this and use it in my HTML pattern field instead.
regex_email = /^([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$/
regex_url = /^(http|ftp|https)|[.][a-zA-Z][a-zA-Z]/
regex_message = /seo|captcha|sales|Б|Г|Д|ё|Ж|П|Ф|И|й|Л|Ц|Ш|Щ|Э|Ю|Я/i
regex_all = /#{regex_email}|#{regex_url}|#{regex_message}/i
I have created locale hashes for regex_url, regex_message, and regex_url. I want to negate hose values then add AND operators of some kind and use something like my example below as my pattern.
"data-pattern" => "NOT#{regex_email}AND NOT#{regex_url}AND NOT#{regex_message}"
Either this or something like this where the negation of my hashes is added to the hash itself.
"data-pattern" => "#{regex_email}AND#{regex_url}AND#{regex_message}"
I've searched a good number of links about AND and NOT but the discussions are not really clear.
I'm asking this question because I have not been successful in coding regex expressions that only include letters, numbers, period, or hyphen. All the solutions I found still allowed any single character when I included the . in my regex. I even tried escaping it but it still matches.
Assuming that your expressions regex_email, regex_url and regex_message are working reliably, and that you're looking for a logical negation of your regex_all pattern on the regex level (that is how I understand your question).
You can make use of a slightly modified negative lookahead logic, like so:
/^(?!.*#{regex_email})(?!.*#{regex_url})(?!.*#{regex_message})/

How to extract FirstName and LastName from html tags with regex?

I have response body which contains
"<h3 class="panel-title">Welcome
First Last </h3>"
I want to fetch 'First Last' as a output
The regular expression I have tried are
"Welcome(\s*([A-Za-z]+))(\s*([A-Za-z]+))"
"Welcome \s*([A-Za-z]+)\s*([A-Za-z]+)"
But not able to get the result. If I remove the newline and take it as
"<h3 class="panel-title">Welcome First Last </h3>" it is detecting in online regex maker.
I suspect your problem is the carriage return between "Welcome" and the user name. If you use the "single-line mode" flag (?s) in your regex, it will ignore newlines. Try these:
(?s)Welcome(\s*([A-Za-z]+))(\s*([A-Za-z]+))
(?s)Welcome \s*([A-Za-z]+)\s*([A-Za-z]+)
(this works in jMeter and any other java or php based regex, but not in javascript. In the comments on the question you say you're using javascript and also jMeter - if it is a jMeter question, then this will help. if javaScript, try one of the other answers)
Well, usually I don't recommend regex for this kind of work. DOM manipulation plays at its best.
but you can use following regex to yank text:
/(?:<h3.*?>)([^<]+)(?:<\/h3>)/i
See demo at https://regex101.com/r/wA2sZ9/1
This will extract First and Last names including extra spacing. I'm sure you can easily deal with spaces.
In jmeter reg exp extractor you can use:
<h3 class="panel-title">Welcome(.*?)</h3>
Then take value using $1$.
In the data you shown welcome is followed by enter.If actually its part of response then you have to use \n.
<h3 class="panel-title">Welcome\n(.*?)</h3>
Otherwise above one is enough.
First verify this in jmeter using regular expression tester of response body.
Welcome([\s\S]+?)<
Try this, it will definitely work.
Regular expressions are greedy by default, try this
Welcome\s*([A-Za-z]+)\s*([A-Za-z]+)
Groups 1 and 2 contain your data
Check it here

Ruby Puppet Regex string matching

I'm somewhat new to ruby and have done a ton of google searching but just can't seem to figure out how to match this particular pattern. I have used rubular.com and can't seem to find a simple way to match. Here is what I'm trying to do:
I have several types of hosts, they take this form:
Sample hostgroups
host-brd0000.localdomain
host-cat0000.localdomain
host-dog0000.localdomain
host-bug0000.localdomain
Next I have a case statement, I want to keep out the bugs (who doesn't right?). I want to do something like this to match the series of characters. However, it starts matching at host-b, host-c, host-d, and matches only a single character as if I did a [brdcatdog].
case $hostgroups { #variable takes the host string up to where the numbers begin
# animals to keep
/host-[["brd"],["cat"],["dog"]]/: {
file {"/usr/bin/petstore-friends.sh":
owner => petstore,
group => petstore,
mode => 755,
source => "puppet:///modules/petstore-friends.sh.$hostgroups",
}
}
I could do something like [bcd][rao][dtg] but it's not very clean looking and will match nonsense like "bad""cot""dat""crt" which I don't want.
Is there a slick way to use \A and [] that I'm missing?
Thanks for your help.
-wootini
How about using negative lookahead?
host-(?!bug).*
Here is the RUBULAR permalink matching everything except those pesky bugs!
Is this what you're looking for?
host-(brd|cat|dog)
(Following gtgaxiola's example, here's the Rubular permalink)

Regex empty string or email

I found a lot of Regex email validation in SO but I did not find any that will accept an empty string. Is this possible through Regex only? Accepting either empty string or email only? I want to have this on Regex only.
This regex pattern will match an empty string:
^$
And this will match (crudely) an email or an empty string:
(^$|^.*#.*\..*$)
matching empty string or email
(^$|^[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.(?:[a-zA-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)$)
matching empty string or email but also matching any amount of whitespace
(^\s*$|^[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.(?:[a-zA-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)$)
see more about the email matching regex itself:
http://www.regular-expressions.info/email.html
The answers above work ($ for empty), but I just tried this and it also works to just leave empty like so:
/\A(INTENSE_EMAIL_REGEX|)\z/i
Same thing in reverse order
/\A(|INTENSE_EMAIL_REGEX)\z/i
this will solve, it will accept empty string or exact an email id
"^$|^([\w\.\-]+)#([\w\-]+)((\.(\w){2,3})+)$"
I prefer /^\s+$|^$/gi to match empty and empty spaces.
console.log(" ".match(/^\s+$|^$/gi));
console.log("".match(/^\s+$|^$/gi));
If you need to cover any length of empty spaces then you may want to use following regex:
"^\s*$"
If you are using it within rails - activerecord validation you can set
allow_blank: true
As:
validates :email, allow_blank: true, format: { with: EMAIL_REGEX }
Don't match an email with a regex. It's extremely ugly and long and complicated and your regex parser probably can't handle it anyway. Try to find a library routine for matching them. If you only want to solve the practical problem of matching an email address (that is, if you want wrong code that happens to (usually) work), use the regular-expressions.info link someone else submitted.
As for the empty string, ^$ is mentioned by multiple people and will work fine.

Regular Expression for some email rules

I was using a regular expression for email formats which I thought was ok but the customer is complaining that the expression is too strict. So they have come back with the following requirement:
The email must contain an "#" symbol and end with either .xx or .xxx ie.(.nl or .com). They are happy with this to pass validation. I have started the expression to see if the string contains an "#" symbol as below
^(?=.*[#])
this seems to work but how do I add the last requirement (must end with .xx or .xxx)?
A regex simply enforcing your two requirements is:
^.+#.+\.[a-zA-Z]{2,3}$
However, there are email validation libraries for most languages that will generally work better than a regex.
I always use this for emails
^([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}" +
#"\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\" +
#".)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$
Try http://www.ultrapico.com/Expresso.htm as well!
It is not possible to validate every E-Mail Adress with RegEx but for your requirements this simple regex works. It is neither complete nor does it in any way check for errors but it exactly meets the specs:
[^#]+#.+\.\w{2,3}$
Explanation:
[^#]+: Match one or more characters that are not #
#: Match the #
.+: Match one or more of any character
\.: Match a .
\w{2,3}: Match 2 or 3 word-characters (a-zA-Z)
$: End of string
Try this :
([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4})\be(\w*)s\b
A good tool to test our regular expression :
http://gskinner.com/RegExr/
You could use
[#].+\.[a-z0-9]{2,3}$
This should work:
^[^#\r\n\s]+[^.#]#[^.#][^#\r\n\s]+\.(\w){2,}$
I tested it against these invalid emails:
#exampleexample#domaincom.com
example#domaincom
exampledomain.com
exampledomain#.com
exampledomain.#com
example.domain#.#com
e.x+a.1m.5e#em.a.i.l.c.o
some-user#internal-email.company.c
some-user#internal-ema#il.company.co
some-user##internal-email.company.co
#test.com
test#asdaf
test#.com
test.#com.co
And these valid emails:
example#domain.com
e.x+a.1m.5e#em.a.i.l.c.om
some-user#internal-email.company.co
edit
This one appears to validate all of the addresses from that wikipedia page, though it probably allows some invalid emails as well. The parenthesis will split it into everything before and after the #:
^([^\r\n]+)#([^\r\n]+\.?\w{2,})$
niceandsimple#example.com
very.common#example.com
a.little.lengthy.but.fine#dept.example.com
disposable.style.email.with+symbol#example.com
other.email-with-dash#example.com
user#[IPv6:2001:db8:1ff::a0b:dbd0]
"much.more unusual"#example.com
"very.unusual.#.unusual.com"#example.com
"very.(),:;<>[]\".VERY.\"very#\\ \"very\".unusual"#strange.example.com
postbox#com
admin#mailserver1
!#$%&'*+-/=?^_`{}|~#example.org
"()<>[]:,;#\\\"!#$%&'*+-/=?^_`{}| ~.a"#example.org
" "#example.org
üñîçøðé#example.com
üñîçøðé#üñîçøðé.com