What punctuation characters are necessary for a city field?

What punctuation characters are necessary for a city field? - regex

I'm considering a regex to restrict punctuation in city names (worldwide). What would be a fairly inclusive whitelist of these?
I'm thinking:
(space)
. period
- hyphen
' apostrophe
Also thinking maybe comma or slash but I don't have any examples. Are there others?

This is the most inclusive whitelist of punctuation to be found in city names. The ASCII apostrophe codepoint may not be the one used when someone is entering an apostrophe on their keyboard.
If you've discerned the encoding of the submitted text, you should be able to see if it falls under the Punctuation block:
/\p{InGeneral_Punctuation}/
If you are limiting yourself to Latin-Extended, just use those:
/\p{InLatin_Extended-A}/
Also, ask yourself: What are the consequences of someone putting a funny character into my city name? Is that worse than the consequences of someone not being able to enter their correct address, if I exclude too much?

USPS standard address formatting calls for stripping all special characters except 'necessary' hyphens and dashes used in the primary and/or secondary street address lines and hyphens in the ZIP.
So if an address is:
John O'Toole
456 N 4-1/2 St
San José, CA 99999-4545
The post office prefers envelopes be labeled:
John O Toole
456 N 4 1/2 St
San Jose CA 9999-4545

Related

How can a regex catch all parts before a keyword from a finite set, but sometimes separated only by a single space

This question relates to PCRE regular expressions.
Part of my big dataset are address data like this:
12H MARKET ST. Canada
123 SW 4TH Street USA
ONE HOUSE USA
1234 Quantity Dr USA
123 Quality Court Canada
1234 W HWY 56A USA
12345 BERNARDO CNTR DRIVE Canada
12 VILLAGE PLAZA USA
1234 WEST SAND LAKE RD ?567 USA
1234 TELEGRAM BLVD SUITE D USA
1234-A SOUTHWEST FRWY USA
123 CHURCH STREET USA
123 S WASHINGTON USA
123 NW-SE BLVD USA
# USA
1234 E MAIN STREET USA
I would like to extract the street names including house numbers and additional information from these records. (Of course there are other things in those records and I already know how to extract them).
For the purpose of this question I just manually clipped the interesting part from the data for this example.
The number of words in the address parts is not known before. The only criterion I have found so far is to find the occurrence of country names belonging to some finite set, which of course is bigger than (USA|Canada). For brevity I limit my example just to those two countries.
This regular expression
([a-zA-Z0-9?\-#.]+\s)
already isolates the words making up what I am after, including one space after them. Unfortunately there are cases, where the country after the to-be-extracted street information is only separated by a single space from the country, like e.g. in the first and in the last example.
Since I want to capture the matching parts glued together, I place a + sign behind my regular expression:
([a-zA-Z0-9?\-#.]+\s)+
but then in the two nasty cases with only one separating space before the country, the country is also caught!
Since I know the possible countries from looking at the data, I could try to exclude them by a look ahead-condition like this:
([a-zA-Z0-9?\-#.]+\s)(?!USA|Canada)
which excludes ST. from the match in the first line and STREET from the match in the last line. Of course the single capture groups are not yet glued together by this.
So I would add a plus sign to the group on the left:
([a-zA-Z0-9?\-#.]+\s)+(?!USA|Canada)
But then ST. and STREET and the Country, separated by only a single space, are caught again together with the country, which I want to exclude from my result!
How would you proceed in such a case?
If it would be possible by properly using regular expressions to replace each country name by the same one preceded by an additional space (or even to do this only for cases, where there is only a single space in front of one of the country-names), my problem would be solved. But I want to avoid such a substitution for the whole database in a separate run because a country name might appear in some other column too.
I am quite new to regular expressions and I have no idea how to do two processing steps onto the same input in sequence. - But maybe, someone has a better idea how to cope with this problem.

If I understand correctly, you want all content before the country (excluding spaces before the country). The country will always be present at the end of the line and comes from a list.
So you should be able to set the 'global' and 'multiline' options and then use the following regex:
^(.*?)(?=\s+(USA|Canada)\s*$)
Explanation:
^(.*) match all characters from start of line
(?=\s+(USA|Canada)\s*$) look ahead for one or more spaces, followed by one of the country names, followed by zero or more spaces and end of line.
That should give you a list with all addresses.
Edit:
I have changed the first part to: (.*?), making it non-greedy. That way the match will stop at the last letter before country instead of including some spaces.

Regex for name with non-latin characters in python [duplicate]

For website validation purposes, I need first name and last name validation.
For the first name, it should only contain letters, can be several words with spaces, and has a minimum of three characters, but a maximum at top 30 characters. An empty string shouldn't be validated (e.g. Jason, jason, jason smith, jason smith, JASON, Jason smith, jason Smith, and jason SMITH).
For the last name, it should be a single word, only letters, with at least three characters, but at most 30 characters. Empty strings shouldn't be validated (e.g. lazslo, Lazslo, and LAZSLO).

Don't forget about names like:
Mathias d'Arras
Martin Luther King, Jr.
Hector Sausage-Hausen
This should do the trick for most things:
/^[a-z ,.'-]+$/i
OR Support international names with super sweet unicode:
/^[a-zA-ZàáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšžÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ∂ð ,.'-]+$/u

You make false assumptions on the format of first and last name. It is probably better not to validate the name at all, apart from checking that it is empty.

After going through all of these answers I found a way to build a tiny regex that supports most languages and only allows for word characters. It even supports some special characters like hyphens, spaces and apostrophes. I've tested in python and it supports the characters below:
^[\w'\-,.][^0-9_!¡?÷?¿/\\+=##$%ˆ&*(){}|~<>;:[\]]{2,}$
Characters supported:
abcdefghijklmnopqrstwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
áéíóúäëïöüÄ'
陳大文
łŁőŐűŰZàáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųū
ÿýżźñçčšžÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁ
ŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ.-
ñÑâê都道府県Федерации
আবাসযোগ্য জমির걸쳐 있는

I have created a custom regex to deal with names:
I have tried these types of names and found working perfect
John Smith
John D'Largy
John Doe-Smith
John Doe Smith
Hector Sausage-Hausen
Mathias d'Arras
Martin Luther King
Ai Wong
Chao Chang
Alzbeta Bara
My RegEx looks like this:
^([a-zA-Z]{2,}\s[a-zA-Z]{1,}'?-?[a-zA-Z]{2,}\s?([a-zA-Z]{1,})?)
MVC4 Model:
[RegularExpression("^([a-zA-Z]{2,}\\s[a-zA-Z]{1,}'?-?[a-zA-Z]{2,}\\s?([a-zA-Z]{1,})?)", ErrorMessage = "Valid Charactors include (A-Z) (a-z) (' space -)") ]
Please note the double \\ for escape characters
For those of you that are new to RegEx I thought I'd include a explanation.
^ // start of line
[a-zA-Z]{2,} // will except a name with at least two characters
\s // will look for white space between name and surname
[a-zA-Z]{1,} // needs at least 1 Character
\'?-? // possibility of **'** or **-** for double barreled and hyphenated surnames
[a-zA-Z]{2,} // will except a name with at least two characters
\s? // possibility of another whitespace
([a-zA-Z]{1,})? // possibility of a second surname

I have searched and searched and played and played with it and although it is not perfect it may help others making the attempt to validate first and last names that have been provided as one variable.
In my case, that variable is $name.
I used the following code for my PHP:
if (preg_match('/\b([A-Z]{1}[a-z]{1,30}[- ]{0,1}|[A-Z]{1}[- \']{1}[A-Z]{0,1}
[a-z]{1,30}[- ]{0,1}|[a-z]{1,2}[ -\']{1}[A-Z]{1}[a-z]{1,30}){2,5}/', $name)
# there is no space line break between in the above "if statement", any that
# you notice or perceive are only there for formatting purposes.
#
# pass - successful match - do something
} else {
# fail - unsuccessful match - do something
I am learning RegEx myself but I do have the explanation for the code as provided by RegEx buddy.
Here it is:
Assert position at a word boundary «\b»
Match the regular expression below and capture its match into backreference number 1
«([A-Z]{1}[a-z]{1,30}[- ]{0,1}|[A-Z]{1}[- \']{1}[A-Z]{0,1}[a-z]{1,30}[- ]{0,1}|[a-z]{1,2}[ -\']{1}[A-Z]{1}[a-z]{1,30}){2,5}»
Between 2 and 5 times, as many times as possible, giving back as needed (greedy) «{2,5}»
* I NEED SOME HELP HERE WITH UNDERSTANDING THE RAMIFICATIONS OF THIS NOTE *
Note: I repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «{2,5}»
Match either the regular expression below (attempting the next alternative only if this one fails) «[A-Z]{1}[a-z]{1,30}[- ]{0,1}»
Match a single character in the range between “A” and “Z” «[A-Z]{1}»
Exactly 1 times «{1}»
Match a single character in the range between “a” and “z” «[a-z]{1,30}»
Between one and 30 times, as many times as possible, giving back as needed (greedy) «{1,30}»
Match a single character present in the list “- ” «[- ]{0,1}»
Between zero and one times, as many times as possible, giving back as needed (greedy) «{0,1}»
Or match regular expression number 2 below (attempting the next alternative only if this one fails) «[A-Z]{1}[- \']{1}[A-Z]{0,1}[a-z]{1,30}[- ]{0,1}»
Match a single character in the range between “A” and “Z” «[A-Z]{1}»
Exactly 1 times «{1}»
Match a single character present in the list below «[- \']{1}»
Exactly 1 times «{1}»
One of the characters “- ” «- » A ' character «\'»
Match a single character in the range between “A” and “Z” «[A-Z]{0,1}»
Between zero and one times, as many times as possible, giving back as needed (greedy) «{0,1}»
Match a single character in the range between “a” and “z” «[a-z]{1,30}»
Between one and 30 times, as many times as possible, giving back as needed (greedy) «{1,30}»
Match a single character present in the list “- ” «[- ]{0,1}»
Between zero and one times, as many times as possible, giving back as needed (greedy) «{0,1}»
Or match regular expression number 3 below (the entire group fails if this one fails to match) «[a-z]{1,2}[ -\']{1}[A-Z]{1}[a-z]{1,30}»
Match a single character in the range between “a” and “z” «[a-z]{1,2}»
Between one and 2 times, as many times as possible, giving back as needed (greedy) «{1,2}»
Match a single character in the range between “ ” and “'” «[ -\']{1}»
Exactly 1 times «{1}»
Match a single character in the range between “A” and “Z” «[A-Z]{1}»
Exactly 1 times «{1}»
Match a single character in the range between “a” and “z” «[a-z]{1,30}»
Between one and 30 times, as many times as possible, giving back as needed (greedy) «{1,30}»
I know this validation totally assumes that every person filling out the form has a western name and that may eliminates the vast majority of folks in the world. However, I feel like this is a step in the proper direction. Perhaps this regular expression is too basic for the gurus to address simplistically or maybe there is some other reason that I was unable to find the above code in my searches. I spent way too long trying to figure this bit out, you will probably notice just how foggy my mind is on all this if you look at my test names below.
I tested the code on the following names and the results are in parentheses to the right of each name.
STEVE SMITH (fail)
Stev3 Smith (fail)
STeve Smith (fail)
Steve SMith (fail)
Steve Sm1th (passed on the Steve Sm)
d'Are to Beaware (passed on the Are to Beaware)
Jo Blow (passed)
Hyoung Kyoung Wu (passed)
Mike O'Neal (passed)
Steve Johnson-Smith (passed)
Jozef-Schmozev Hiemdel (passed)
O Henry Smith (passed)
Mathais d'Arras (passed)
Martin Luther King Jr (passed)
Downtown-James Brown (passed)
Darren McCarty (passed)
George De FunkMaster (passed)
Kurtis B-Ball Basketball (passed)
Ahmad el Jeffe (passed)
If you have basic names, there must be more than one up to five for the above code to work, that are similar to those that I used during testing, this code might be for you.
If you have any improvements, please let me know. I am just in the early stages (first few months of figuring out RegEx.
Thanks and good luck,
Steve

I've tried almost everything on this page, then I decided to modify the most voted answer which ended up working best. Simply matches all languages and includes .,-' characters.
Here it is:
/^[\p{L} ,.'-]+$/u

First name would be
"([a-zA-Z]{3,30}\s*)+"
If you need the whole first name part to be shorter than 30 letters, you need to check that seperately, I think. The expression ".{3,30}" should do that.
Your last name requirements would translate into
"[a-zA-Z]{3,30}"
but you should check these. There are plenty of last names containing spaces.

As maček said:
Don't forget about names like:
Mathias d'Arras
Martin Luther King, Jr.
Hector Sausage-Hausen
and to remove cases like:
..Mathias
Martin king, Jr.-
This will cover more cases:
^([a-z]+[,.]?[ ]?|[a-z]+['-]?)+$

This regex work for me (was using in Angular 8) :
([a-zA-Z',.-]+( [a-zA-Z',.-]+)*){2,30}
It will be invalid if there is:-
Any whitespace start or end of the name
Got symbols e.g. #
Less than 2 or more than 30
Example invalid First Name (whitespace)
Example valid First Name :

I'm working on the app that validates International Passports (ICAO). We support only english characters. While most foreign national characters can be represented by a character in the Latin alphabet e.g. è by e, there are several national characters that require an extra letter to represent them such as the German umlaut which requires an ‘e’ to be added to the letter e.g. ä by ae.
This is the JavaScript Regex for the first and last names we use:
/^[a-zA-Z '.-]*$/
The max number of characters on the international passport is up to 31.
We use maxlength="31" to better word error messages instead of including it in the regex.
Here is a snippet from our code in AngularJS 1.6 with form and error handling:
class PassportController {
constructor() {
this.details = {};
// English letters, spaces and the following symbols ' - . are allowed
// Max length determined by ng-maxlength for better error messaging
this.nameRegex = /^[a-zA-Z '.-]*$/;
}
}
angular.module('akyc', ['ngMessages'])
.controller('PassportController', PassportController);
.has-error p[ng-message] {
color: #bc111e;
}
.tip {
color: #535f67;
}
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.6.6/angular.min.js"></script>
<script src="https://code.angularjs.org/1.6.6/angular-messages.min.js"></script>
<main ng-app="akyc" ng-controller="PassportController as $ctrl">
<form name="$ctrl.form">
<div name="lastName" ng-class="{ 'has-error': $ctrl.form.lastName.$invalid} ">
<label for="pp-last-name">Surname</label>
<div class="tip">Exactly as it appears on your passport</div>
<div ng-messages="$ctrl.form.lastName.$error" ng-if="$ctrl.form.$submitted" id="last-name-error">
<p ng-message="required">Please enter your last name</p>
<p ng-message="maxlength">This field can be at most 31 characters long</p>
<p ng-message="pattern">Only English letters, spaces and the following symbols ' - . are allowed</p>
</div>
<input type="text" id="pp-last-name" ng-model="$ctrl.details.lastName" name="lastName"
class="form-control" required ng-pattern="$ctrl.nameRegex" ng-maxlength="31" aria-describedby="last-name-error" />
</div>
<button type="submit" class="btn btn-primary">Test</button>
</form>
</main>

Read almost all highly voted posts (only some are good). After understanding the problem in detail & doing research, here are the tight regexes:
1). ^[A-Z][a-z]*(([,.] |[ '-])[A-Za-z][a-z]*)*(\.?)$
name Z is allowed contrary to the assumption made by some in the thread.
No leading or trailing spaces are allowed, empty string is NOT allowed, string containing only spaces is NOT allowed
Supports English alphabets only
Supports hyphens (Some-Foobarbaz-name, Some foobarbaz-Name), apostrophes (David D'Costa, David D'costa, David D'costa R'Costa p'costa), periods (Dr. L. John, Robert Downey Jr., Md. K. P. Asif) and commas (Martin Luther, Jr.).
First alphabet of only the first word of a name MUST be capital.
NOT Allowed: John sTeWaRT, JOHN STEWART, Md. KP Asif, John Stewart PhD
Allowed: John Stewart, John stewart, Md. K P Asif
you can easily modify this condition.
If you also want to allow names like Queen Elizabeth 2 or Henry IV:
2). ^[A-Z][a-z]*(([,.] |[ '-])[A-Za-z][a-z]*)*([.]?| (-----)| [1-9][0-9]*)$
replace ----- with roman numeral's regex (which itself is long) OR you can use this alternative regex which is based on KISS philosophy [IVXLCDM]+ (here I, V, X, ... in ANY random order will satisfy the regex).
I personally suggest to use this regex:
3). ^[A-Z][a-z]*(([,.] |[ '-])[A-Za-z][a-z]*)*(\.?)( [IVXLCDM]+)?$
Feel free to try this regex HERE & make any modifications of your choice.
I have provided with tight regex which covers every possible name I found on my research with no bug. Modify these regexes to relax some of the unwanted constraints.
[UPDATE - March, 2022]
Here are 4 more regexes:
^[A-Za-z]+(([,.] |[ '-])[A-Za-z]+)*([.,'-]?)$
^((([,.'-]| )(?<!( {2}|[,.'-]{2})))*[A-Za-z]+)+[,.'-]?$
^( ([A-Za-z,.'-]+|$))+|([A-Za-z,.'-]+( |$))+$
^(([ ,.'-](?<!( {2}|[,.'-]{2})))*[A-Za-z])+[ ,.'-]?$
It's been a while since I looked back at these 4 regexes so I forgot their specifications. These 4 regexes are not tight, unlike the previous ones but do the job very well. These regexes distinguish 3 parts of a name: English alphabet, space and special character. Which one you need out of these 4 depends on your answer (Yes/No) to these questions:
have at least 1 alphabet?
can start with a space or a special character?
can end with a space or a special character?
are 2 consecutive spaces allowed?
are 2 consecutive special characters allowed?
Note: name validation should ONLY serve as a warning NOT a necessity a name should fulfill because there is no fixed naming pattern, if there is one it can change overnight and thus, any tight regex you come across will become obsolete somewhere in future.

There is one issue with the top voted answer here which recommends this regex:
/^[a-z ,.'-]+$/i
It takes spaces only as a valid name!
The best solution in my opinion is to add a negative look forward to the beginning:
^(?!\s)([a-z ,.'-]+)$/i

I use:
/^(?:[\u00c0-\u01ffa-zA-Z'-]){2,}(?:\s[\u00c0-\u01ffa-zA-Z'-]{2,})+$/i
And test for maxlength using some other means

I didn't find any answer helpful for me simply because users can pick a non-english name and simple regex are not helpful. In fact it's actually very hard to find the right expression that works for all languages.
Instead, I picked a different approach and negated all characters that should not be in the name for the valid match. Below pattern negates numerical, special characters, control characters and '\', '/'
Final regex
without punctuations: ["] ['] [,] [.], etc. :
^([^\p{N}\p{S}\p{C}\p{P}]{2,20})$
with punctuations:
^([^\p{N}\p{S}\p{C}\\\/]{2,20})$
With this, all these names are valid:
alex junior
沐宸
Nick
Sarah's Jane ---> with punctuation support
ביממה
حقیقت
Виктория
And following names become invalid:
🤣 Maria
k
١١١١١
123John
This means all names that don't have numerical characters, emojis, \ and are between 2-20 characters are allowed. You can edit the above regex if you want to add more characters to exclusion list.
To get more information about available patterns to include / exclude checkout this:
https://www.regular-expressions.info/unicode.html#prop

^\p{L}{2,}$
^ asserts position at start of a line.
\p{L} matches any kind of letter from any language
{2,} Quantifier — Matches between 2 and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of a line
So it should be a name in any language containing at least 2 letters(or symbols) without numbers or other characters.

If you are searching a simplest way, just check almost 2 words.
/^[^\s]+( [^\s]+)+$/
Valid names
John Doe
pedro alberto ch
Ar. Gen
Mathias d'Arras
Martin Luther King, Jr.
No valid names
John
陳大文

For simplicities sake, you can use:
(.*)\s(.*)
The thing I like about this is that the last name is always after the first name, so if you're going to enter this matched groups into a database, and the name is John M. Smith, the 1st group will be
John M., and the 2nd group will be Smith.

So, with customer we create this crazy regex:
(^$)|(^([^\-!#\$%&\(\)\*,\./:;\?#\[\\\]_\{\|\}¨ˇ“”€\+<=>§°\d\s¤®™©]| )+$)

For first and last names theres are really only 2 things you should be looking for:
Length
Content
Here is my regular expression:
var regex = /^[A-Za-z-,]{3,20}?=.*\d)/
1. Length
Here the {3,20} constrains the length of the string to be between 3 and 20 characters.
2. Content
The information between the square brackets [A-Za-z] allows uppercase and lowercase characters. All subsequent symbols (-,.) are also allowed.

The following expression will work on any language supported by UTF-16 and will ensure that there's a minimum of two components to the name (i.e. first + last), but will also allow any number of middle names.
/^(\S+ )+\S+$/u
At the time of this writing it seems none of the other answers meet all of that criteria. Even ^\p{L}{2,}$, which is the closest, falls short because it will also match "invisible" characters, such as U+FEFF (Zero Width No-Break Space).

Try these solutions, for maximum compatibility, as I have already posted here:
JavaScript:
var nm_re = /^(?:((([^0-9_!¡?÷?¿/\\+=##$%ˆ&*(){}|~<>;:[\]'’,\-.\s])){1,}(['’,\-\.]){0,1}){2,}(([^0-9_!¡?÷?¿/\\+=##$%ˆ&*(){}|~<>;:[\]'’,\-. ]))*(([ ]+){0,1}(((([^0-9_!¡?÷?¿/\\+=##$%ˆ&*(){}|~<>;:[\]'’,\-\.\s])){1,})(['’\-,\.]){0,1}){2,}((([^0-9_!¡?÷?¿/\\+=##$%ˆ&*(){}|~<>;:[\]'’,\-\.\s])){2,})?)*)$/;
HTML5:
<input type="text" name="full_name" id="full_name" pattern="^(?:((([^0-9_!¡?÷?¿/\\+=##$%ˆ&*(){}|~<>;:[\]'’,\-.\s])){1,}(['’,\-\.]){0,1}){2,}(([^0-9_!¡?÷?¿/\\+=##$%ˆ&*(){}|~<>;:[\]'’,\-. ]))*(([ ]+){0,1}(((([^0-9_!¡?÷?¿/\\+=##$%ˆ&*(){}|~<>;:[\]'’,\-\.\s])){1,})(['’\-,\.]){0,1}){2,}((([^0-9_!¡?÷?¿/\\+=##$%ˆ&*(){}|~<>;:[\]'’,\-\.\s])){2,})?)*)$" required>

This is what I use.
This regex accepts only names with minimum characters, from A-Z a-z ,space and -.
Names example:
Ionut Ionete, Ionut-Ionete Cantemir, Ionete Ionut-Cantemirm Ionut-Cantemir Ionete-Second
The limit of name's character is 3. If you want to change this, modify {3,} to {6,}
([a-zA-Z\-]+){3,}\s+([a-zA-Z\-]+){3,}

This seems to do the job for me:
[\S]{2,} [\S]{2,}( [\S]{2,})*

I usually write:
return /^[a-zA-Z\-\s\.\'\`\u00E0-\u00FC]+$/.test(firstName);

Fullname with only one whitespace:
^[a-zA-Z'\-\pL]+(?:(?! {2})[a-zA-Z'\-\pL ])*[a-zA-Z'\-\pL]+$

A simple function using preg_match in php
<?php
function name_validation($name) {
if (!preg_match("/^[a-zA-Z ]*$/", $name) === false) {
echo "$name is a valid name";
} else {
echo "$name is not a valid name";
}
}
//Test
name_validation('89name');
?>

If you want the whole first name to be between 3 and 30 characters with no restrictions on individual words, try this :
[a-zA-Z ]{3,30}
Beware that it excludes all foreign letters as é,è,à,ï.
If you want the limit of 3 to 30 characters to apply to each individual word, Jens regexp will do the job.

var name = document.getElementById('login_name').value;
if ( name.length < 4 && name.length > 30 )
{
alert ( 'Name length is mismatch ' ) ;
}
var pattern = new RegExp("^[a-z\.0-9 ]+$");
var return_value = var pattern.exec(name);
if ( return_value == null )
{
alert ( "Please give valid Name");
return false;
}

Regular expression for address field validation

I am trying to write a regular expression that facilitates an address, example 21-big walk way or 21 St.Elizabeth's drive I came up with the following regular expression but I am not too keen to how to incorporate all the characters (alphanumeric, space dash, full stop, apostrophe)
"regexp=^[A-Za-z-0-99999999'

See the answer to this question on address validating with regex:
regex street address match
The problem is, street addresses vary so much in formatting that it's hard to code against them. If you are trying to validate addresses, finding if one isn't valid based on its format is mighty hard to do.
This would return the following address (253 N. Cherry St. ), anything with its same format:
\d{1,5}\s\w.\s(\b\w*\b\s){1,2}\w*\.
This allows 1-5 digits for the house number, a space, a character followed by a period (for N. or S.), 1-2 words for the street name, finished with an abbreviation (like st. or rd.).
Because regex is used to see if things meet a standard or protocol (which you define), you probably wouldn't want to allow for the addresses provided above, especially the first one with the dash, since they aren't very standard. you can modify my above code to allow for them if you wish--you could add
(-?)
to allow for a dash but not require one.
In addition, http://rubular.com/ is a quick and interactive way to learn regex. Try it out with the addresses above.

In case if you don't have a fixed format for the address as mentioned above, I would use regex expression just to eliminate the symbols which are not used in the address (like specialized sybmols - &(%#$^). Result would be:
[A-Za-z0-9'\.\-\s\,]

Just to add to Serzas' answer(since don't have enough reps. to comment).
alphabets and numbers can effectively be replaced by \w for words.
Additionally apostrophe,comma,period and hyphen doesn't necessarily need a backslash.
My requirement also involved front and back slashes so \/ and finally whitespaces with \s. The working regex for me ,as such was :
pattern: "[\w',-\\/.\s]"

Regular expression for simple address validation
^[#.0-9a-zA-Z\s,-]+$
E.g. for Address match case
#1, North Street, Chennai - 11
E.g. for Address not match case
$1, North Street, Chennai # 11

I have succesfully used ;
Dim regexString = New stringbuilder
With regexString
.Append("(?<h>^[\d]+[ ])(?<s>.+$)|") 'find the 2013 1st ambonstreet
.Append("(?<s>^.*?)(?<h>[ ][\d]+[ ])(?<e>[\D]+$)|") 'find the 1-7-4 Dual Ampstreet 130 A
.Append("(?<s>^[\D]+[ ])(?<h>[\d]+)(?<e>.*?$)|") 'find the Terheydenlaan 320 B3
.Append("(?<s>^.*?)(?<h>\d*?$)") 'find the 245e oosterkade 9
End With
Dim Address As Match = Regex.Match(DataRow("customerAddressLine1"), regexString.ToString(), RegexOptions.Multiline)
If Not String.IsNullOrEmpty(Address.Groups("s").Value) Then StreetName = Address.Groups("s").Value
If Not String.IsNullOrEmpty(Address.Groups("h").Value) Then HouseNumber = Address.Groups("h").Value
If Not String.IsNullOrEmpty(Address.Groups("e").Value) Then Extension = Address.Groups("e").Value
The regex will attempt to find a result, if there is none, it move to the next alternative. If no result is found, none of the 4 formats where present.

This one worked for me:
\d+[ ](?:[A-Za-z0-9.-]+[ ]?)+(?:Avenue|Lane|Road|Boulevard|Drive|Street|Ave|Dr|Rd|Blvd|Ln|St)\.?
The source: https://www.codeproject.com/Tips/989012/Validate-and-Find-Addresses-with-RegEx

Regex is a very bad choice for this kind of task. Try to find a web service or an address database or a product which can clean address data instead.
Related:
Address validation using Google Maps API

As a simple one line expression recommend this,
^([a-zA-z0-9/\\''(),-\s]{2,255})$

I needed
STREET # | STREET | CITY | STATE | ZIP
So I wrote the following regex
[0-9]{1,5}( [a-zA-Z.]*){1,4},?( [a-zA-Z]*){1,3},? [a-zA-Z]{2},? [0-9]{5}
This allows
1-5 Street #s
1-4 Street description words
1-3 City words
2 Char State
5 Char Zip code
I also added option , for separating street, city, state, zip

Here is the approach I have taken to finding addresses using regular expressions:
A set of patterns is useful to find many forms that we might expect from an address starting with simply a number followed by set of strings (ex. 1 Basic Road) and then getting more specific such as looking for "P.O. Box", "c/o", "attn:", etc.
Below is a simple test in python. The test will find all the addresses but not the last 4 items which are company names. This example is not comprehensive, but can be altered to suit your needs and catch examples you find in your data.
import re
strings = [
'701 FIFTH AVE',
'2157 Henderson Highway',
'Attn: Patent Docketing',
'HOLLYWOOD, FL 33022-2480',
'1940 DUKE STREET',
'111 MONUMENT CIRCLE, SUITE 3700',
'c/o Armstrong Teasdale LLP',
'1 Almaden Boulevard',
'999 Peachtree Street NE',
'P.O. BOX 2903',
'2040 MAIN STREET',
'300 North Meridian Street',
'465 Columbus Avenue',
'1441 SEAMIST DR.',
'2000 PENNSYLVANIA AVENUE, N.W.',
'465 Columbus Avenue',
'28 STATE STREET',
'P.O, Drawer 800889.',
'2200 CLARENDON BLVD.',
'840 NORTH PLANKINTON AVENUE',
'1025 Connecticut Avenue, NW',
'340 Commercial Street',
'799 Ninth Street, NW',
'11318 Lazarro Ln',
'P.O, Box 65745',
'c/o Ballard Spahr LLP',
'8210 SOUTHPARK TERRACE',
'1130 Connecticut Ave., NW, Suite 420',
'465 Columbus Avenue',
"BANNER & WITCOFF , LTD",
"CHIP LAW GROUP",
"HAMMER & ASSOCIATES, P.C.",
"MH2 TECHNOLOGY LAW GROUP, LLP",
]
patterns = [
"c\/o [\w ]{2,}",
"C\/O [\w ]{2,}",
"P.O\. [\w ]{2,}",
"P.O\, [\w ]{2,}",
"[\w\.]{2,5} BOX [\d]{2,8}",
"^[#\d]{1,7} [\w ]{2,}",
"[A-Z]{2,2} [\d]{5,5}",
"Attn: [\w]{2,}",
"ATTN: [\w]{2,}",
"Attention: [\w]{2,}",
"ATTENTION: [\w]{2,}"
]
contact_list = []
total_count = len(strings)
found_count = 0
for string in strings:
pat_no = 1
for pattern in patterns:
match = re.search(pattern, string.strip())
if match:
print("Item found: " + match.group(0) + " | Pattern no: " + str(pat_no))
found_count += 1
pat_no += 1
print("-- Total: " + str(total_count) + " Found: " + str(found_count))

UiPath Academy training video lists this RegEx for US addresses (and it works fine for me):
\b\d{1,8}(-)?[a-z]?\W[a-z|\W|\.]{1,}\W(road|drive|avenue|boulevard|circle|street|lane|waylrd\.|st\.|dr\.|ave\.|blvd\.|cir\.|In\.|rd|dr|ave|blvd|cir|ln)

I had a different use case - find any addresses in logs and scold application developers (favourite part of a devops job). I had the advantage of having the word "address" in the pattern but should work without that if you have specific field to scan
\baddress.[0-9\\\/# ,a-zA-Z]+[ ,]+[0-9\\\/#, a-zA-Z]{1,}
Look for the word "address" - skip this if not applicable
Look for first part numbers, letters, #, space - Unit Number / street number/suite number/door number
Separated by a space or comma
Look for one or more of rest of address numbers, letters, #, space
Tested against :
1 Sleepy Boulevard PO, Box 65745
Suite #100 /98,North St,Snoozepura
Ave., New Jersey,
Suite 420 1130 Connect Ave., NW,
Suite 420 19 / 21 Old Avenue,
Suite 12, Springfield, VIC 3001
Suite#100/98 North St Snoozepura
This worked for me when there were street addresses with unit/suite numbers, zip codes, only street. It also didn't match IP addresses or mac addresses. Worked with extra spaces.
This assumes users are normal people separate elements of a street address with a comma, hash sign, or space and not psychopaths who use characters like "|" or ":"!

For French address and some international address too, I use it.
[\\D+ || \\d]+\\d+[ ||,||[A-Za-z0-9.-]]+(?:[Rue|Avenue|Lane|... etcd|Ln|St]+[ ]?)+(?:[A-Za-z0-9.-](.*)]?)

I was inspired from the responses given here and came with those 2 solutions
support optional uppercase
support french also
regex structure
numbers (required)
letters, chars and spaces
at least one common address keyword (required)
as many chars you want before the line break
definitions:
accuracy
capacity of detecting addresses and not something that looks like an address which is not.
range
capacity to detect uncommon addresses.
Regex 1:
high accuracy
low range
/[0-9]+[ |[a-zà-ú.,-]* ((highway)|(autoroute)|(north)|(nord)|(south)|(sud)|(east)|(est)|(west)|(ouest)|(avenue)|(lane)|(voie)|(ruelle)|(road)|(rue)|(route)|(drive)|(boulevard)|(circle)|(cercle)|(street)|(cer\.)|(cir\.)|(blvd\.)|(hway\.)|(st\.)|(aut\.)|(ave\.)|(ln\.)|(rd\.)|(hw\.)|(dr\.)|(a\.))([ .,-]*[a-zà-ú0-9]*)*/i
regex 2:
low accuracy
high range
/[0-9]*[ |[a-zà-ú.,-]* ((highway)|(autoroute)|(north)|(nord)|(south)|(sud)|(east)|(est)|(west)|(ouest)|(avenue)|(lane)|(voie)|(ruelle)|(road)|(rue)|(route)|(drive)|(boulevard)|(circle)|(cercle)|(street)|(cer\.?)|(cir\.?)|(blvd\.?)|(hway\.?)|(st\.?)|(aut\.?)|(ave\.?)|(ln\.?)|(rd\.?)|(hw\.?)|(dr\.?)|(a\.))([ .,-]*[a-zà-ú0-9]*)*/i

This one works well for me
^(\d+) ?([A-Za-z](?= ))? (.*?) ([^ ]+?) ?((?<= )APT)? ?((?<= )\d*)?$
Source : https://community.alteryx.com/t5/Alteryx-Designer-Discussions/RegEx-Addresses-different-formats-and-headaches/td-p/360147

Here is my RegEx for address, city & postal validation rules
validation rules:
address -
1 - 40 characters length.
Letters, numbers, space and . , : ' #
city -
1 - 19 characters length
Only Alpha characters are allowed
Spaces are allowed
postalCode -
The USA zip must meet the following criteria and is required:
Minimum of 5 digits (9 digits if zip + 4 is provided)
Numeric only
A Canadian postal code is a six-character string.
in the format A1A 1A1, where A is a letter and 1 is a digit.
a space separates the third and fourth characters.
do not include the letters D, F, I, O, Q or U.
the first position does not make use of the letters W or Z.
address: ^[a-zA-Z0-9 .,#;:'-]{1,40}$
city: ^[a-zA-Z ]{1,19}$
usaPostal: ^([0-9]{5})(?:[-]?([0-9]{4}))?$
canadaPostal : ^(?!.*[DFIOQU])[A-VXY][0-9][A-Z] ?[0-9][A-Z][0-9]$

\b(\d{1,8}[a-z]?[0-9\/#- ,a-zA-Z]+[ ,]+[.0-9\/#, a-zA-Z]{1,})\n

A more dynamic approach to #micah would be the following:
(?'Address'(?'Street'[0-9][a-zA-Z\s]),?\s*(?'City'[A-Za-z\s]),?\s(?'Country'[A-Za-z])\s(?'Zipcode'[0-9]-?[0-9]))
It won't care about individual lengths of segments of code.
https://regex101.com/r/nuy7hB/1

Custom RegEx expression for validating different possibilities of phone number entries?

I'm looking for a custom RegEx expression (that works!) to will validate common phone number with area code entries (no country code) such as:
111-111-1111
(111) 111-1111
(111)111-1111
111 111 1111
111.111.1111
1111111111
And combinations of these / anything else I may have forgotton.
Also, is it possible to have the RegEx expression itself reformat the entry? So take the 1111111111 and put it in 111-111-1111 format. The regex will most likely be entered in a Joomla / some type of CMS module, so I can't really add code to it aside from the expression itself.

\(?(\d{3})\)?[ .-]?(\d{3})[ .-]?(\d{4})
will match all your examples; after a match, backreference 1 will contain the area code, backreference 2 and 3 will contain the phone number.
I hope you don't need to handle international phone numbers, too.
If the phone number is in a string by itself, you could also use
^\s*\(?(\d{3})\)?[ .-]?(\d{3})[ .-]?(\d{4})\s*$
allowing for leading/trailing whitespace and nothing else.

Why not just remove spaces, parenthesis, dashes, and periods, then check that it is a number of 10 digits?

Depending on the language in question, you might be better off using a replace-like statement to replace non-numeric characters: ()-/. with nothing, and then just check if what is left is a 10-digit number.

Extract a portion of text using RegEx

I would like to extract portion of a text using a regular expression. So for example, I have an address and want to return just the number and streets and exclude the rest:
2222 Main at King Edward Vancouver BC CA
But the addresses varies in format most of the time. I tried using Lookbehind Regex and came out with this expression:
.*?(?=\w* \w* \w{2}$)
The above expressions handles the above example nicely but then it gets way too messy as soon as commas come into the text, postal codes which can be a 6 character string or two 3 character strings with a space in the middle, etc...
Is there any more elegant way of extracting a portion of text other than a lookbehind regex?
Any suggestion or a point in another direction is greatly appreciated.
Thanks!

Regular expressions are for data that is REGULAR, that follows a pattern. So if your data is completely random, no, there's no elegant way to do this with regex.
On the other hand, if you know what values you want, you can probably write a few simple regexes, and then just test them all on each string.
Ex.
regex1= address # grabber, regex2 = street type grabber, regex3 = name grabber.
Attempt a match on string1 with regex1, regex2, and finally regex3. Move on to the next string.

well i thot i'd throw my hat into the ring:
.*(?=,? ([a-zA-Z]+,?\s){3}([\d-]*\s)?)
and you might want ^ or \d+ at the front for good measure
and i didn't bother specifying lengths for the postal codes... just any amount of characters hyphens in this one.
it works for these inputs so far and variations on comas within the City/state/country area:
2222 Main at King Edward Vancouver, BC, CA, 333-333
555 road and street place CA US 95000
2222 Main at King Edward Vancouver BC CA 333
555 road and street place CA US
it is counting at there being three words at the end for the city, state and country but other than that it's like ryansstack said, if it's random it won't work. if the city is two words like New York it won't work. yeah... regex isn't the tool for this one.
btw: tested on regexhero.net

i can think of 2 ways you can do this
1) if you know that "the rest" of your data after the address is exactly 2 fields, ie BC and CA, you can do split on your string using space as delimiter, remove the last 2 items.
2) do a split on delimiter /[A-Z][A-Z]/ and store the result in array. then print out the array ( this is provided that the address doesn't contain 2 or more capital letters)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js