US Phone Number Verification - regex

I have a website form that requires a US phone number input for follow up purposes, and this is very necessary in this case. I want try to eliminate users entering junk data 330-000-0000. I have seen some options of third parties that validate phone numbers for you, however idk if that is the best option for this situation. However if you have every used one of these third parties and can make a recommendation that would also be greatly appreciated here.
However I am considering checking the number against a set of rules to just try to narrow down the junk phone numbers received.
not a 555 number
does not contain 7 identical digits
valid area code (this is readily available)
not a 123-1234 or 123-4567
I guess I could also count out 867-5309 (heh*)
Would this result in any situations that you can think of that would not allow a user to enter their phone number? Could you think of any other rules that a phone number should not contain? Any other thoughts?

It seems to me that you're putting more effort into this than it warrants. Consider:
If your purpose is to guard against mis-entered phone numbers, then you can probably catch well over 90% of them with just a very simple check.
If your purpose is to try to force users to provide a valid number whether they want to give that information out or not, then you've taken on a hopeless task - even if you were able to access 100% accurate, up-to-the-second telco databases to verify that the exact number entered is currently live, you still don't gain any assurance that the number they gave you is their own. Once again, a simple check will foil the majority of people entering bogus numbers, but those who are willing to try more than two or three times will find a way to defeat your attempts to gain their numbers.
Either way, a simple test is going to get you good results and going into more complex rule sets will take up increasingly more time while providing increasingly little benefit to you (while also potentially adding false positives, as already shown with the "seven of the same digit" and 867-5309 cases).

You can do phone number validation internally in your app using regular expressions. Depending on your language you can call a function that will return true if a supplied phone number matches the expression.
In PHP:
function phone_number_is_valid($phone) {
return (eregi('^(?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4}$', $phone));
}
You can look up different regular expressions online. I found the one above one at http://regexlib.com/DisplayPatterns.aspx?categoryId=7&cattabindex=2
Edit: Some language specific sites for regular expressions:
PHP at php.net: http://php.net/regex
C# at MSDN
Java: http://java.sun.com/developer/technicalArticles/releases/1.4regex/

867-5309 is a valid phone number that is assigned to people in different area codes.

If you can verify the area code then unless you really, really need to know their phone number you're probably doing as much as is reasonable.

In Django there is a nice little contrib package called localflavor wich has a lot of country specific validation code, for example postal codes or phone numbers. You can look in the source too see how django handles these for the country you would like to use; For example: US Form validation. This can be a great recourse for information about countries you know little of as well.

Your customers can still do what I do, which is give out the local moviefone number.
Also, 123-1234 or 123-4567 are only invalid numbers because the prefix begins with a 1, but 234-5678 or 234-1234 would actually be valid (though it looks fake).

Maybe take a look at the answers to this question.

If you're sticking with just US- and Canada-format numbers, I think the following regex might work:
[2-9][0-9][0-9]-[2-9][0-9][0-9]-[0-9][0-9][0-9][0-9] & ![2-9][0-9][0-9]-555-[0-9][0-9][0-9][0-9]

You also need to take into account ten-digit dialing, which is used in some areas now: this is different from long-distance dialing (ie, 303-555-1234, as opposed to 1-303-555-1234). In some places, a valid phone number is ten digits long; in others, it is seven.

This is a quick function that I use (below). I do have access to a zipcode database that contains areacode and prefix data which is updated monthly. I have often thought about doing a data dip to confirm that the prefix exists for the area code.
public static bool isPhone(string phoneNum)
{
Regex rxPhone1, rxPhone2;
rxPhone1 = new Regex(#"^\d{10,}$");
rxPhone2 = new Regex(#"(\d)\1\1\1\1\1\1\1\1\1");
if(phoneNum.Trim() == string.Empty)
return false;
if(phoneNum.Length != 10)
return false;
//Check to make sure the phone number has at least 10 digits
if (!rxPhone1.IsMatch(phoneNum))
return false;
//Check for repeating characters (ex. 9999999999)
if (rxPhone2.IsMatch(phoneNum))
return false;
//Make sure first digit is not 1 or zero
if(phoneNum.Substring(0,1) == "1" || phoneNum.Substring(0,1) == "0")
return false;
return true;
}

I don't nkow if this is the right place, it's a formatting function rather than a validation function, I thought let's share it with the community, maybe one day it will be helpful..
Private Sub OnNumberChanged()
Dim sep = "-"
Dim num As String = Number.ToCharArray.Where(Function(c) Char.IsDigit(c)) _
.ToArray
Dim ext As String = Nothing
If num.Length > 10 Then ext = num.Substring(10)
ext = If(IsNullOrEmpty(ext), "", " x" & ext)
_Number = Left(num, 3) & sep & Mid(num, 4, 3) & sep & Mid(num, 7, 4) & ext
End Sub
My validation function is like so:
Public Shared Function ValidatePhoneNumber(ByVal number As String)
Return number IsNot Nothing AndAlso number.ToCharArray. _
Where(Function(c) Char.IsNumber(c)).Count >= 10
End Function
I call this last function # the OnNumberChanging(number As String) method of the entity.

For US and International Phone validation I found this code the most suitable:
((\+[1-9]{1,4}[ \-]*)|(\([0-9]{2,3}\)[ \-]*)|([0-9]{2,4})[ \-]*)*?[0-9]{3,4}?[ \-]*[0-9]{3,4}?$
You can find an (albeit somewhat dated) discussion here.

Those parameters look pretty good to me, I might also avoid numbers starting with 911 just to be safe.

Related

Validate Street Address Format

I'm trying to validate the format of a street address in Google Forms using regex. I won't be able to confirm it's a real address, but I would like to at least validate that the string is:
[numbers(max 6 digits)] [word(minimum one to max 8 words with
spaces in between and numbers and # allowed)], [words(minimum one to max four words, only letters)], [2
capital letters] [5 digit number]
I want the spaces and commas I left in between the brackets to be required, exactly where I put them in the above example. This would validate
123 test st, test city, TT 12345
That's obviously not a real address, but at least it requires the entry of the correct format. The data is coming from people answering a question on a form, so it will always be just an address, no names. Plus they're all address is one area South Florida, where pretty much all addresses will match this format. The problem I'm having is people not entering a city, or commas, so I want to give them an error if they don't. So far, I've found this
^([0-9a-zA-Z]+)(,\s*[0-9a-zA-Z]+)*$
But that doesn't allow for multiple words between the commas, or the capital letters and numbers for zip. Any help would save me a lot of headaches, and I would greatly appreciate it.
There really is a lot to consider when dealing with a street address--more than you can meaningfully deal with using a regular expression. Besides, if a human being is at a keyboard, there's always a high likelihood of typing mistakes, and there just isn't a regex that can account for all possible human errors.
Also, depending on what you intend to do with the address once you receive it, there's all sorts of helpful information you might need that you wouldn't get just from splitting the rough address components with a regex.
As a software developer at SmartyStreets (disclosure), I've learned that regular expressions really are the wrong tool for this job because addresses aren't as 'regular' (standardized) as you might think. There are more rigorous validation tools available, even plugins you can install on your web form to validate the address as it is typed, and which return a wealth of of useful metadata and information.
Try Regex:
\d{1,6}\s(?:[A-Za-z0-9#]+\s){0,7}(?:[A-Za-z0-9#]+,)\s*(?:[A-Za-z]+\s){0,3}(?:[A-Za-z]+,)\s*[A-Z]{2}\s*\d{5}
See Demo
Accepts Apt# also:
(^[0-9]{1,5}\s)([A-Za-z]{1,}(\#\s|\s\#|\s\#\s|\s)){1,5}([A-Za-z]{1,}\,|[0-9]{1,}\,)(\s[a-zA-Z]{1,}\,|[a-zA-Z]{1,}\,)(\s[a-zA-Z]{2}\s|[a-zA-Z]{2}\s)([0-9]{5})

google-speech-api and overriding phone number recognition

Does anyone know if there is a way to manipulate the recognition of phone numbers when using the Google Speech API? I am trying to implement a transcription scenario where a caller will say a string of letters and numbers, but the logic out of the box seems to be to try to fit any sequence of numbers to a phone number scheme, even if it means rendering letters into numbers they may sound vaguely similar to (or not). I have tried using speech contexts to manipulate the values within the "phone number" by typing out and giving the entire thing as it should be as a speech context ("eight seven seven two bee three seven", for example), but it refuses to override the digits being interpreted as a phone number. Has anyone encountered this issue or is aware of any way in which this could be worked around?
Thanks!
I'm not aware of an easy way to do this. For the Web Speech API for JavaScript, doing the following seems to yield fewer results that are forced into phone number format.:
Set the maxAlternatives = 2, e.g.,
var recognition = new speechRecognition();
recognition.maxAlternatives = 2;
Then use the second result offered, e.g.,
constr speechToText = event.results[0][1].transcript
You can get pretty far by processing the result. A remaining challenge is that since the result often clumps digits together, you lose the distinction between a series of single digit numbers and one multi-digit number (e.g., '15' & '1', '5'). The utility of this approach depends on the specifics of the numbers your app is trying to capture.
In at least one case, setting the language to en-PH (English Philippines) seems to have fixed, or at least notably improved, this problem. Other English language options might work as well.
en-GB comes back as a UK formatted number where they put one digit first then the rest of the number.

SQL Server Regular Expression Workaround in T-SQL?

I have some SQLCLR code for working with Regular Expresions. But now that it is getting migrated into Azure, which does not allow SQLCLR, that's out. I need to find a way to do regex in pure T-SQL.
Master Data Services are not available because the dev edition of MSSQL we have is not R2.
All ideas appreciated, thanks.
Regular expression match samples that need handling
(culled from regexlib and other places over the past few years)
email address
^[\w-]+(\.[\w-]+)*#([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?$
dollars
^(\$)?(([1-9]\d{0,2}(\,\d{3})*)|([1-9]\d*)|(0))(\.\d{2})?$
uri
^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*#)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$
one numeric digit
^\d$
percentage
^-?[0-9]{0,2}(\.[0-9]{1,2})?$|^-?(100)(\.[0]{1,2})?$
height notation
^\d?\d'(\d|1[01])"$
numbers between 1 1000
^([1-9]|[1-9]\d|1000)$
credit card numbers
^((4\d{3})|(5[1-5]\d{2})|(6011))-?\d{4}-?\d{4}-?\d{4}|3[4,7]\d{13}$
list of years
^([1-9]{1}[0-9]{3}[,]?)*([1-9]{1}[0-9]{3})$
days of the week
^(Sun|Mon|(T(ues|hurs))|Fri)(day|\.)?$|Wed(\.|nesday)?$|Sat(\.|urday)?$|T((ue?)|(hu?r?))\.?$
time on 12 hour clock
(?<Time>^(?:0?[1-9]:[0-5]|1(?=[012])\d:[0-5])\d(?:[ap]m)?)
time on 24 hour clock
^(?:(?:(?:0?[13578]|1[02])(\/|-|\.)31)\1|(?:(?:0?[13-9]|1[0-2])(\/|-|\.)(?:29|30)\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:0?2(\/|-|\.)29\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$
usa phone numbers
^\(?[\d]{3}\)?[\s-]?[\d]{3}[\s-]?[\d]{4}$
Unfortunately, you will not be able to move your CLR function(s) to SQL Azure. You will need to either use the normal string functions (PATINDEX, CHARINDEX, LIKE, and so on) or perform these operations outside of the database.
EDIT Adding some information for the examples added to the question.
Email address
This one is always controversial because people disagree about which version of the RFC they want to support. The original didn't support apostrophes, for example (or at least people insist that it didn't - I haven't dug it up from the archives and read it myself, admittedly), and it has to be expanded quite often for new TLDs (once for 4-letter TLDs like .info, then again for 6-letter TLDs like .museum). I've often heard quite knowledgeable people state that perfect e-mail validation is impossible, and having previously worked for an e-mail service provider, I can tell you that it was a constantly moving target. But for the simplest approaches, see the question TSQL Email Validation (without regex).
One numeric digit
Probably the easiest one of the bunch:
WHERE #s LIKE '[0-9]';
Credit card numbers
Assuming you strip out dashes and spaces, which you should do in any case. Note that this isn't an actual check of the credit card number algorithm to ensure that the number itself is actually valid, just that it conforms to the general format (AmEx = 15 digits starting with a 3, the rest are 16 digits - Visa starts with a 4, MasterCard starts with a 5, Discover starts with 6 and I think there's one that starts with a 7 (though that may just be gift cards of some kind)):
WHERE #s + ' ' LIKE '[3-7]'+ REPLICATE('[0-9]', 14) + '[0-9 ]';
If you want to be a little more precise at the cost of being long-winded, you can say:
WHERE (LEN(#s) = 15 AND #s LIKE '3' + REPLICATE('[0-9]', 14))
OR (LEN(#s) = 16 AND #s LIKE '[4-7]' + REPLICATE('[0-9]', 15));
USA phone numbers
Again, assuming you're going to strip out parentheses, dashes and spaces first. Pretty sure a US area code can't start with a 1; if there are other rules, I am not aware of them.
WHERE #s LIKE '[2-9]' + REPLICATE('[0-9]', 9);
-----
I'm not going to go further, because a lot of the other expressions you've defined can be extrapolated from the above. Hopefully this gives you a start. You should be able to Google for some of the others to see how other people have replicated the patterns with T-SQL. Some of them (like days of the week) can probably just be checked against a table - seems overkill to do an invasie pattern matching for a set of 7 possible values. Similarly with a list of 1000 numbers or years, these are things that will be much easier (and probably more efficient) to check if the numeric value is in a table rather than convert it to a string and see if it matches some pattern.
I'll state again that a lot of this will be much better if you can cleanse and validate the data before it gets into the database in the first place. You should strive to do this wherever possible, because without CLR, you just can't do powerful RegEx inside SQL Server.
Ken Henderson wrote about ways to replicate RegEx without CLR, but they require sp_OA* procedures, which are even less likely to ever see the light of day in Azure than CLR. Most of the other articles you'll find online use an approach similar to Ken's or use complex use of built-in string functions.
Which portions of RegEx specifically are you trying to replicate? Can you show an example of the input/output of one of your functions? Perhaps it will be easy to convert to get similar results using the built-in string functions like PATINDEX.

How to validate with regex that a string is OK as long as it contains 10 digits?

I'm processing input from a Web form. Basically, all I care about is that the value provided includes 10 digits, no more, no less.
These would be valid inputs:
1234567890
123 456 789 0 Hello!
My number is: 123456-7890 thanks
These would be invalid inputs:
123456789033 (too long)
123 Hello! (too short)
My number is one five zero nine thanks (no digits)
I've tried many different things with Regextester but it never matches correctly. I'm using the 'preg' setting (which is what I figured my CMS Typo3 uses) and my closest attempt is:
([0-9][^0-9]*){10}
which is kinda lame but is the closest I got.
Cheers!
EDIT: I cannot use any programming language to implement this. Imagine that I have a admin console field in front of me, in which I must enter a regular expression that will be used to validate the value. That's all the latitude I have. Cheers.
I think you've got the right idea. Maybe you can simplify it as (\d\D*){10}
If the regex has to match the complete string, you would want \D*(\d\D*){10}
UPDATE: It looks like you need ^\D*(\d\D*){10}$ to make sure you match the complete string.
A regular expression is not always the best tool for this kind of job. In this case it's probably easier and simpler to write a function to count the number of digits in a string. (Since you didn't mention a programming language, I'll use Python in my example.)
def count_digits(s):
return len([x for x in s if x.isdigit()])
Then, you can use it like this:
s = "My number is: 123456-7890 thanks"
if count_digits(s) == 10:
print("looks okay")
else:
print("doesn't contain 10 digits")

Regular expression for validating names and surnames?

Although this seems like a trivial question, I am quite sure it is not :)
I need to validate names and surnames of people from all over the world. Imagine a huge list of miilions of names and surnames where I need to remove as well as possible any cruft I identify. How can I do that with a regular expression? If it were only English ones I think that this would cut it:
^[a-z -']+$
However, I need to support also these cases:
other punctuation symbols as they might be used in different countries (no idea which, but maybe you do!)
different Unicode letter sets (accented letter, greek, japanese, chinese, and so on)
no numbers or symbols or unnecessary punctuation or runes, etc..
titles, middle initials, suffixes are not part of this data
names are already separated by surnames.
we are prepared to force ultra rare names to be simplified (there's a person named '#' in existence, but it doesn't make sense to allow that character everywhere. Use pragmatism and good sense.)
note that many countries have laws about names so there are standards to follow
Is there a standard way of validating these fields I can implement to make sure that our website users have a great experience and can actually use their name when registering in the list?
I would be looking for something similar to the many "email address" regexes that you can find on google.
I sympathize with the need to constrain input in this situation, but I don't believe it is possible - Unicode is vast, expanding, and so is the subset used in names throughout the world.
Unlike email, there's no universally agreed-upon standard for the names people may use, or even which representations they may register as official with their respective governments. I suspect that any regex will eventually fail to pass a name considered valid by someone, somewhere in the world.
Of course, you do need to sanitize or escape input, to avoid the Little Bobby Tables problem. And there may be other constraints on which input you allow as well, such as the underlying systems used to store, render or manipulate names. As such, I recommend that you determine first the restrictions necessitated by the system your validation belongs to, and create a validation expression based on those alone. This may still cause inconvenience in some scenarios, but they should be rare.
I'll try to give a proper answer myself:
The only punctuations that should be allowed in a name are full stop, apostrophe and hyphen. I haven't seen any other case in the list of corner cases.
Regarding numbers, there's only one case with an 8. I think I can safely disallow that.
Regarding letters, any letter is valid.
I also want to include space.
This would sum up to this regex:
^[\p{L} \.'\-]+$
This presents one problem, i.e. the apostrophe can be used as an attack vector. It should be encoded.
So the validation code should be something like this (untested):
var name = nameParam.Trim();
if (!Regex.IsMatch(name, "^[\p{L} \.\-]+$"))
throw new ArgumentException("nameParam");
name = name.Replace("'", "'"); //&apos; does not work in IE
Can anyone think of a reason why a name should not pass this test or a XSS or SQL Injection that could pass?
complete tested solution
using System;
using System.Text.RegularExpressions;
namespace test
{
class MainClass
{
public static void Main(string[] args)
{
var names = new string[]{"Hello World",
"John",
"João",
"タロウ",
"やまだ",
"山田",
"先生",
"мыхаыл",
"Θεοκλεια",
"आकाङ्क्षा",
"علاء الدين",
"אַבְרָהָם",
"മലയാളം",
"상",
"D'Addario",
"John-Doe",
"P.A.M.",
"' --",
"<xss>",
"\""
};
foreach (var nameParam in names)
{
Console.Write(nameParam+" ");
var name = nameParam.Trim();
if (!Regex.IsMatch(name, #"^[\p{L}\p{M}' \.\-]+$"))
{
Console.WriteLine("fail");
continue;
}
name = name.Replace("'", "'");
Console.WriteLine(name);
}
}
}
}
I would just allow everything (except an empty string) and assume the user knows what his name is.
There are 2 common cases:
You care that the name is accurate and are validating against a real paper passport or other identity document, or against a credit card.
You don't care that much and the user will be able to register as "Fred Smith" (or "Jane Doe") anyway.
In case (1), you can allow all characters because you're checking against a paper document.
In case (2), you may as well allow all characters because "123 456" is really no worse a pseudonym than "Abc Def".
I would think you would be better off excluding the characters you don't want with a regex. Trying to get every umlaut, accented e, hyphen, etc. will be pretty insane. Just exclude digits (but then what about a guy named "George Forman the 4th") and symbols you know you don't want like ##$%^ or what have you. But even then, using a regex will only guarantee that the input matches the regex, it will not tell you that it is a valid name.
EDIT after clarifying that this is trying to prevent XSS: A regex on a name field is obviously not going to stop XSS on its own. However, this article has a section on filtering that is a starting point if you want to go that route:
s/[\<\>\"\'\%\;\(\)\&\+]//g;
"Secure Programming for Linux and Unix HOWTO" by David A. Wheeler, v3.010 Edition (2003)
v3.72, 2015-09-19 is a more recent version.
BTW, do you plan to only permit the Latin alphabet, or do you also plan to try to validate Chinese, Arabic, Hindi, etc.?
As others have said, don't even try to do this. Step back and ask yourself what you are actually trying to accomplish. Then try to accomplish it without making any assumptions about what people's names are, or what they mean.
I don’t think that’s a good idea. Even if you find an appropriate regular expression (maybe using Unicode character properties), this wouldn’t prevent users from entering pseudo-names like John Doe, Max Mustermann (there even is a person with that name), Abcde Fghijk or Ababa Bebebe.
You could use the following regex code to validate 2 names separeted by a space with the following regex code:
^[A-Za-zÀ-ú]+ [A-Za-zÀ-ú]+$
or just use:
[[:lower:]] = [a-zà-ú]
[[:upper:]] =[A-ZÀ-Ú]
[[:alpha:]] = [A-Za-zÀ-ú]
[[:alnum:]] = [A-Za-zÀ-ú0-9]
It's a very difficult problem to validate something like a name due to all the corner cases possible.
Corner Cases
Anything anything here
Sanitize the inputs and let them enter whatever they want for a name, because deciding what is a valid name and what is not is probably way outside the scope of whatever you're doing; given the range of potential strange - and legal names is nearly infinite.
If they want to call themselves Tricyclopltz^2-Glockenschpiel, that's their problem, not yours.
A very contentious subject that I seem to have stumbled along here. However sometimes it's nice to head dear little-bobby tables off at the pass and send little Robert to the headmasters office along with his semi-colons and SQL comment lines --.
This REGEX in VB.NET includes regular alphabetic characters and various circumflexed european characters. However poor old James Mc'Tristan-Smythe the 3rd will have to input his pedigree in as the Jim the Third.
<asp:RegularExpressionValidator ID="RegExValid1" Runat="server"
ErrorMessage="ERROR: Please enter a valid surname<br/>" SetFocusOnError="true" Display="Dynamic"
ControlToValidate="txtSurname" ValidationGroup="MandatoryContent"
ValidationExpression="^[A-Za-z'\-\p{L}\p{Zs}\p{Lu}\p{Ll}\']+$">
This one worked perfectly for me in JavaScript:
^[a-zA-Z]+[\s|-]?[a-zA-Z]+[\s|-]?[a-zA-Z]+$
Here is the method:
function isValidName(name) {
var found = name.search(/^[a-zA-Z]+[\s|-]?[a-zA-Z]+[\s|-]?[a-zA-Z]+$/);
return found > -1;
}
Steps:
first remove all accents
apply the regular expression
To strip the accents:
private static string RemoveAccents(string s)
{
s = s.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.Length; i++)
{
if (CharUnicodeInfo.GetUnicodeCategory(s[i]) != UnicodeCategory.NonSpacingMark) sb.Append(s[i]);
}
return sb.ToString();
}
This somewhat helps:
^[a-zA-Z]'?([a-zA-Z]|\.| |-)+$
This one should work
^([A-Z]{1}+[a-z\-\.\']*+[\s]?)*
Add some special characters if you need them.