Blank space in UK postal code - postal-code

Is there a way to know where the blank space has to be, in an UK postal code, if the user doesn't write the postal code with the blank space?
For example, if the input is: EC1A1BB, W1A0AX, M11AE, B338TH, CR26XH, DN551PT then the output must be: EC1A 1BB, W1A 0AX, M1 1AE, B33 8TH, CR2 6XH, DN55 1PT.
Thank you

The space is always before the final three characters.
This graphic shows the format:
Source: GetTheData

I've created a C# method to validate postcode:
private string ValidatePostcode(string postcode)
{
if (!postcode.Substring(postcode.Length - 4, 1).Equals(" "))
{
postcode = postcode.Insert(postcode.Length - 3, " ");
}
return postcode;
}

The space seems to fall before the last three characters. Postcodes always end with Digit-Letter-Letter.

The space is always before the NLL suffix.
The prefix can be 3 or 4 characters before the space. LLNA_NLL.
The above graphic is wrong! Should end NLL.

Related

How to build regex for complex Indonesian phone number format?

Recently, i am using regexpal to build this custom regex.
I am working with several test case for Indonesian phone number.
Here an example for the simple one 08xx-3456-7890 or 08xx34567890
but it can be a bit confusing if i get this following format
here is my phone (08xx)34567890
08xx.3456.7890
08xx 3456 7890
(+62) 8xx34567890
(+62) 8xx-3456-7890
+628xx34567890
+62-8xx-3456-7890
+628xx 3456 7890
here is regex i have done with (08|628|62)[\s\)\-]*(\s|(\d){3,})
but i can not cover all of those sample.
+62 is country code
Can you please help me with any solution to validate those format?
the phone number is possible contains string instead just number, because it is part of sentence
You can try to remove tokens first with regex [().- ] and replace with empty string. Then you can use (?:\+62)?0?8\d{2}(\d{8}) to match a phone number. This matches an optional +62, an optional 0, 8, two digits (xx) and the phone number: 8 digits. Group 1 contains the phone number.
For Indonesian phone numbers, this roughly should work.
(\()?(\+62|62|0)(\d{2,3})?\)?[ .-]?\d{2,4}[ .-]?\d{2,4}[ .-]?\d{2,4}
To see it in action:
https://regex101.com/r/qtEg6H/3
Also see the answer in the comment below for a more efficient way.
// MARK: - Validate Indonesia Mobile Number Without Country Code
//The number should be start by 8 digit character
class func validateMobileNumber(_ number: String) -> Bool {
let numberRegEx = "[8][0-9]{10,14}"
let numberTest = NSPredicate(format: "SELF MATCHES %#", numberRegEx)
if numberTest.evaluate(with: number) == true {
return true
}
else {
return false
}
}

Issue with regex for validation

I was just wondering if you lovely folk would be able to give me some pointers as to where I may be going wrong.
I have implemented a regex checker for UK landline phone numbers and it all seems to work except one set of number. This is my first time at using regular expressions.
Below is the regex that I am using for it:
((01([2-69][1]\s?\d{3}\s?\d{4}$|[2-9][02-9][0-9]\s?\d{3}\s?\d{3}$)))|((02((0\s?[378]\s?(\d{3}|\d{4})\s?\d{4})|([3489]{1}\d{2}\s?\d{3})\s?\d{3})))
The group that is giving me trouble is
((02(((0\s?[378]\s?(\d{3}|\d{4})\s?\d{4})$)... (in the interests of brevity, I have cut out the remaining portion of it. The parentheses are all present and correct in the full regex
I have checked it against regexpal and it seems to validate properly.
I used a test number of
02031111111 <- Valid
0203 111 1111 <- Valid
020 3111 1111 <- Valid
020311111111 <- Invalid (passes validation)
0203 11111111 <- Invalid (passes validation)
020 3111 11111 <- Invalid (fails validation - which is what I want)
This is my code block where the regex function is performed
function valid_phone(landline, country)
{
var homep = '';
switch (country)
{
case 'England':
homep = /^((01([2-69][1]\s?\d{3}\s?\d{4}$|[2-9][02-9][0-9]\s?\d{3}\s?\d{3}$)))|((02((0\s?[378]\s?(\d{3}|\d{4})\s?\d{4})|([3489]{1}\d{2}\s?\d{3})\s?\d{3})))$/;
break;
case 'USA':
homep = /^((((\(\d{3}\))|(\d{3}(-| )))\d{3}(-| )\d{4})|(\+?\d{2}((-| )\d{1,8}){1,5}))$/;
break
default:
homep = /^((01([2-69][1]\s?\d{3}\s?\d{4}$|[2-9][02-9][0-9]\s?\d{3}\s?\d{3}$)))|((02(((0\s?[378]\s?(\d{3}|\d{4})\s?\d{4}))|([3489]{1}\d{2}\s?\d{3})\s?\d{3}$)))$/;
}
return homep.test(landline);
}
This is where it is called in jQuery
$('#landline').blur(function()
{
$lp = $('#landline').val();
$co = $('#country').val();
if (!valid_phone($lp, $co))
{
$('#error').slideDown();
$('#error').append('Wrong phone number format');
}
else
{
$('#error').slideUp();
$('#error').html('');
}
});
As I say, any pointers would be appreciated. I am sure it is something simple. I have also tried without the $ before the last ) and it is still the same.
Thanks in advance.
As far as I can understand, the spaces are allowed only in some places in the number, but the number of digits is fixed. In that case, the problem is here:
(\d{3}|\d{4})
This matches either 3 or 4 digits, so it can accept an overlong number. If you want to allow a space after either 3 or 4 numbers, you need to write it like this:
(\d{3}\s\d{4}|\d{4}\s\d{3}|\d{7})
Complete sub-regex for this type of number:
020\s?[378]\s?(\d{3}\s\d{4}|\d{4}\s\d{3}|\d{7})

Regular expression for address field validation

I am trying to write a regular expression that facilitates an address, example 21-big walk way or 21 St.Elizabeth's drive I came up with the following regular expression but I am not too keen to how to incorporate all the characters (alphanumeric, space dash, full stop, apostrophe)
"regexp=^[A-Za-z-0-99999999'
See the answer to this question on address validating with regex:
regex street address match
The problem is, street addresses vary so much in formatting that it's hard to code against them. If you are trying to validate addresses, finding if one isn't valid based on its format is mighty hard to do.
This would return the following address (253 N. Cherry St. ), anything with its same format:
\d{1,5}\s\w.\s(\b\w*\b\s){1,2}\w*\.
This allows 1-5 digits for the house number, a space, a character followed by a period (for N. or S.), 1-2 words for the street name, finished with an abbreviation (like st. or rd.).
Because regex is used to see if things meet a standard or protocol (which you define), you probably wouldn't want to allow for the addresses provided above, especially the first one with the dash, since they aren't very standard. you can modify my above code to allow for them if you wish--you could add
(-?)
to allow for a dash but not require one.
In addition, http://rubular.com/ is a quick and interactive way to learn regex. Try it out with the addresses above.
In case if you don't have a fixed format for the address as mentioned above, I would use regex expression just to eliminate the symbols which are not used in the address (like specialized sybmols - &(%#$^). Result would be:
[A-Za-z0-9'\.\-\s\,]
Just to add to Serzas' answer(since don't have enough reps. to comment).
alphabets and numbers can effectively be replaced by \w for words.
Additionally apostrophe,comma,period and hyphen doesn't necessarily need a backslash.
My requirement also involved front and back slashes so \/ and finally whitespaces with \s. The working regex for me ,as such was :
pattern: "[\w',-\\/.\s]"
Regular expression for simple address validation
^[#.0-9a-zA-Z\s,-]+$
E.g. for Address match case
#1, North Street, Chennai - 11
E.g. for Address not match case
$1, North Street, Chennai # 11
I have succesfully used ;
Dim regexString = New stringbuilder
With regexString
.Append("(?<h>^[\d]+[ ])(?<s>.+$)|") 'find the 2013 1st ambonstreet
.Append("(?<s>^.*?)(?<h>[ ][\d]+[ ])(?<e>[\D]+$)|") 'find the 1-7-4 Dual Ampstreet 130 A
.Append("(?<s>^[\D]+[ ])(?<h>[\d]+)(?<e>.*?$)|") 'find the Terheydenlaan 320 B3
.Append("(?<s>^.*?)(?<h>\d*?$)") 'find the 245e oosterkade 9
End With
Dim Address As Match = Regex.Match(DataRow("customerAddressLine1"), regexString.ToString(), RegexOptions.Multiline)
If Not String.IsNullOrEmpty(Address.Groups("s").Value) Then StreetName = Address.Groups("s").Value
If Not String.IsNullOrEmpty(Address.Groups("h").Value) Then HouseNumber = Address.Groups("h").Value
If Not String.IsNullOrEmpty(Address.Groups("e").Value) Then Extension = Address.Groups("e").Value
The regex will attempt to find a result, if there is none, it move to the next alternative. If no result is found, none of the 4 formats where present.
This one worked for me:
\d+[ ](?:[A-Za-z0-9.-]+[ ]?)+(?:Avenue|Lane|Road|Boulevard|Drive|Street|Ave|Dr|Rd|Blvd|Ln|St)\.?
The source: https://www.codeproject.com/Tips/989012/Validate-and-Find-Addresses-with-RegEx
Regex is a very bad choice for this kind of task. Try to find a web service or an address database or a product which can clean address data instead.
Related:
Address validation using Google Maps API
As a simple one line expression recommend this,
^([a-zA-z0-9/\\''(),-\s]{2,255})$
I needed
STREET # | STREET | CITY | STATE | ZIP
So I wrote the following regex
[0-9]{1,5}( [a-zA-Z.]*){1,4},?( [a-zA-Z]*){1,3},? [a-zA-Z]{2},? [0-9]{5}
This allows
1-5 Street #s
1-4 Street description words
1-3 City words
2 Char State
5 Char Zip code
I also added option , for separating street, city, state, zip
Here is the approach I have taken to finding addresses using regular expressions:
A set of patterns is useful to find many forms that we might expect from an address starting with simply a number followed by set of strings (ex. 1 Basic Road) and then getting more specific such as looking for "P.O. Box", "c/o", "attn:", etc.
Below is a simple test in python. The test will find all the addresses but not the last 4 items which are company names. This example is not comprehensive, but can be altered to suit your needs and catch examples you find in your data.
import re
strings = [
'701 FIFTH AVE',
'2157 Henderson Highway',
'Attn: Patent Docketing',
'HOLLYWOOD, FL 33022-2480',
'1940 DUKE STREET',
'111 MONUMENT CIRCLE, SUITE 3700',
'c/o Armstrong Teasdale LLP',
'1 Almaden Boulevard',
'999 Peachtree Street NE',
'P.O. BOX 2903',
'2040 MAIN STREET',
'300 North Meridian Street',
'465 Columbus Avenue',
'1441 SEAMIST DR.',
'2000 PENNSYLVANIA AVENUE, N.W.',
'465 Columbus Avenue',
'28 STATE STREET',
'P.O, Drawer 800889.',
'2200 CLARENDON BLVD.',
'840 NORTH PLANKINTON AVENUE',
'1025 Connecticut Avenue, NW',
'340 Commercial Street',
'799 Ninth Street, NW',
'11318 Lazarro Ln',
'P.O, Box 65745',
'c/o Ballard Spahr LLP',
'8210 SOUTHPARK TERRACE',
'1130 Connecticut Ave., NW, Suite 420',
'465 Columbus Avenue',
"BANNER & WITCOFF , LTD",
"CHIP LAW GROUP",
"HAMMER & ASSOCIATES, P.C.",
"MH2 TECHNOLOGY LAW GROUP, LLP",
]
patterns = [
"c\/o [\w ]{2,}",
"C\/O [\w ]{2,}",
"P.O\. [\w ]{2,}",
"P.O\, [\w ]{2,}",
"[\w\.]{2,5} BOX [\d]{2,8}",
"^[#\d]{1,7} [\w ]{2,}",
"[A-Z]{2,2} [\d]{5,5}",
"Attn: [\w]{2,}",
"ATTN: [\w]{2,}",
"Attention: [\w]{2,}",
"ATTENTION: [\w]{2,}"
]
contact_list = []
total_count = len(strings)
found_count = 0
for string in strings:
pat_no = 1
for pattern in patterns:
match = re.search(pattern, string.strip())
if match:
print("Item found: " + match.group(0) + " | Pattern no: " + str(pat_no))
found_count += 1
pat_no += 1
print("-- Total: " + str(total_count) + " Found: " + str(found_count))
UiPath Academy training video lists this RegEx for US addresses (and it works fine for me):
\b\d{1,8}(-)?[a-z]?\W[a-z|\W|\.]{1,}\W(road|drive|avenue|boulevard|circle|street|lane|waylrd\.|st\.|dr\.|ave\.|blvd\.|cir\.|In\.|rd|dr|ave|blvd|cir|ln)
I had a different use case - find any addresses in logs and scold application developers (favourite part of a devops job). I had the advantage of having the word "address" in the pattern but should work without that if you have specific field to scan
\baddress.[0-9\\\/# ,a-zA-Z]+[ ,]+[0-9\\\/#, a-zA-Z]{1,}
Look for the word "address" - skip this if not applicable
Look for first part numbers, letters, #, space - Unit Number / street number/suite number/door number
Separated by a space or comma
Look for one or more of rest of address numbers, letters, #, space
Tested against :
1 Sleepy Boulevard PO, Box 65745
Suite #100 /98,North St,Snoozepura
Ave., New Jersey,
Suite 420 1130 Connect Ave., NW,
Suite 420 19 / 21 Old Avenue,
Suite 12, Springfield, VIC 3001
Suite#100/98 North St Snoozepura
This worked for me when there were street addresses with unit/suite numbers, zip codes, only street. It also didn't match IP addresses or mac addresses. Worked with extra spaces.
This assumes users are normal people separate elements of a street address with a comma, hash sign, or space and not psychopaths who use characters like "|" or ":"!
For French address and some international address too, I use it.
[\\D+ || \\d]+\\d+[ ||,||[A-Za-z0-9.-]]+(?:[Rue|Avenue|Lane|... etcd|Ln|St]+[ ]?)+(?:[A-Za-z0-9.-](.*)]?)
I was inspired from the responses given here and came with those 2 solutions
support optional uppercase
support french also
regex structure
numbers (required)
letters, chars and spaces
at least one common address keyword (required)
as many chars you want before the line break
definitions:
accuracy
capacity of detecting addresses and not something that looks like an address which is not.
range
capacity to detect uncommon addresses.
Regex 1:
high accuracy
low range
/[0-9]+[ |[a-zà-ú.,-]* ((highway)|(autoroute)|(north)|(nord)|(south)|(sud)|(east)|(est)|(west)|(ouest)|(avenue)|(lane)|(voie)|(ruelle)|(road)|(rue)|(route)|(drive)|(boulevard)|(circle)|(cercle)|(street)|(cer\.)|(cir\.)|(blvd\.)|(hway\.)|(st\.)|(aut\.)|(ave\.)|(ln\.)|(rd\.)|(hw\.)|(dr\.)|(a\.))([ .,-]*[a-zà-ú0-9]*)*/i
regex 2:
low accuracy
high range
/[0-9]*[ |[a-zà-ú.,-]* ((highway)|(autoroute)|(north)|(nord)|(south)|(sud)|(east)|(est)|(west)|(ouest)|(avenue)|(lane)|(voie)|(ruelle)|(road)|(rue)|(route)|(drive)|(boulevard)|(circle)|(cercle)|(street)|(cer\.?)|(cir\.?)|(blvd\.?)|(hway\.?)|(st\.?)|(aut\.?)|(ave\.?)|(ln\.?)|(rd\.?)|(hw\.?)|(dr\.?)|(a\.))([ .,-]*[a-zà-ú0-9]*)*/i
This one works well for me
^(\d+) ?([A-Za-z](?= ))? (.*?) ([^ ]+?) ?((?<= )APT)? ?((?<= )\d*)?$
Source : https://community.alteryx.com/t5/Alteryx-Designer-Discussions/RegEx-Addresses-different-formats-and-headaches/td-p/360147
Here is my RegEx for address, city & postal validation rules
validation rules:
address -
1 - 40 characters length.
Letters, numbers, space and . , : ' #
city -
1 - 19 characters length
Only Alpha characters are allowed
Spaces are allowed
postalCode -
The USA zip must meet the following criteria and is required:
Minimum of 5 digits (9 digits if zip + 4 is provided)
Numeric only
A Canadian postal code is a six-character string.
in the format A1A 1A1, where A is a letter and 1 is a digit.
a space separates the third and fourth characters.
do not include the letters D, F, I, O, Q or U.
the first position does not make use of the letters W or Z.
address: ^[a-zA-Z0-9 .,#;:'-]{1,40}$
city: ^[a-zA-Z ]{1,19}$
usaPostal: ^([0-9]{5})(?:[-]?([0-9]{4}))?$
canadaPostal : ^(?!.*[DFIOQU])[A-VXY][0-9][A-Z] ?[0-9][A-Z][0-9]$
\b(\d{1,8}[a-z]?[0-9\/#- ,a-zA-Z]+[ ,]+[.0-9\/#, a-zA-Z]{1,})\n
A more dynamic approach to #micah would be the following:
(?'Address'(?'Street'[0-9][a-zA-Z\s]),?\s*(?'City'[A-Za-z\s]),?\s(?'Country'[A-Za-z])\s(?'Zipcode'[0-9]-?[0-9]))
It won't care about individual lengths of segments of code.
https://regex101.com/r/nuy7hB/1

Validation for a 10 digit phone number

I'm looking for a simple regex that will validate a 10 digit phone number. I'd like to make sure that the number is exactly 10 digits, no letters, hyphens or parens and that the first two digits do not start with 0 or 1. Can someone help out?
/[2-9]{2}\d{8}/
^[2-9]{2}[0-9]{8}$
I consider [0-9] to be better to read than \d, especially considering the preceding [2-9]
The ^ and $ ensure that the input string consists ONLY of those 8 characters - otherwise it is not guaranteed that the input string is not larger - i.e. "12345678901" would match the regex w/o those two characters - although it is 11 chars and starts with a 1!
As Randal pointed out, this question is not consistent with the way phone numbers are formatted in North America (even though the OP stated 'first two digits do not start with 0 or 1'). A better regex for North American phone numbers would be:
^[2-9]{1}[0-9]{9}$
For example, Washington DC's area code is (202). NYC has area code (212). Northern New Jersey has (201).
But more accurately, the NANP has a lot of rules as it relates to what is allowed in area code and exchange (first six digits). This regex should still cover most cases. https://en.wikipedia.org/wiki/North_American_Numbering_Plan
This regex script might help out. I essentially strips any "punctuation" characters, including a leading 1-, then validates it is 10 digits.
The extra part you probably don't need is the formatting to 000-000-0000
formatPhone = function() {
var phone = this.value;
phone = phone.replace(/^1(|-|\(|\)|\.| )*|-|\(|\)|\.| /g, '');
if(phone.length === 10) {
this.value = phone.slice(0,3) + '-' + phone.slice(3,6) + '-' + phone.slice(6,10);
}
}
The Phone Numbers will be of 10 digits, and it will start from 7,8 and 9
[RegularExpression("^([07][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | 8[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | 9[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])$", ErrorMessage = "Enter Valid Mobile Number")]
reference : http://www.regular-expressions.info/numericranges.html

Extract a portion of text using RegEx

I would like to extract portion of a text using a regular expression. So for example, I have an address and want to return just the number and streets and exclude the rest:
2222 Main at King Edward Vancouver BC CA
But the addresses varies in format most of the time. I tried using Lookbehind Regex and came out with this expression:
.*?(?=\w* \w* \w{2}$)
The above expressions handles the above example nicely but then it gets way too messy as soon as commas come into the text, postal codes which can be a 6 character string or two 3 character strings with a space in the middle, etc...
Is there any more elegant way of extracting a portion of text other than a lookbehind regex?
Any suggestion or a point in another direction is greatly appreciated.
Thanks!
Regular expressions are for data that is REGULAR, that follows a pattern. So if your data is completely random, no, there's no elegant way to do this with regex.
On the other hand, if you know what values you want, you can probably write a few simple regexes, and then just test them all on each string.
Ex.
regex1= address # grabber, regex2 = street type grabber, regex3 = name grabber.
Attempt a match on string1 with regex1, regex2, and finally regex3. Move on to the next string.
well i thot i'd throw my hat into the ring:
.*(?=,? ([a-zA-Z]+,?\s){3}([\d-]*\s)?)
and you might want ^ or \d+ at the front for good measure
and i didn't bother specifying lengths for the postal codes... just any amount of characters hyphens in this one.
it works for these inputs so far and variations on comas within the City/state/country area:
2222 Main at King Edward Vancouver, BC, CA, 333-333
555 road and street place CA US 95000
2222 Main at King Edward Vancouver BC CA 333
555 road and street place CA US
it is counting at there being three words at the end for the city, state and country but other than that it's like ryansstack said, if it's random it won't work. if the city is two words like New York it won't work. yeah... regex isn't the tool for this one.
btw: tested on regexhero.net
i can think of 2 ways you can do this
1) if you know that "the rest" of your data after the address is exactly 2 fields, ie BC and CA, you can do split on your string using space as delimiter, remove the last 2 items.
2) do a split on delimiter /[A-Z][A-Z]/ and store the result in array. then print out the array ( this is provided that the address doesn't contain 2 or more capital letters)