Regular Expression for England only Postcode - regex

I have an Asp.Net website and I want to use a RegularExpressionValidator to check if a UK postcode is English (i.e. it's not Scottish, Welsh or N.Irish).
It should be possible to see if the postcode is English by using just the letters from the first segmant (called the Postcode Area). In total there are 124 postcode areas and this is a list of them.
From that list, the following postcode areas are not in England.
ZE,KW,IV,HS,PH,AB,DD,PA,FK,G,KY,KA,DG,TD,EH,ML (Scotland)
LL,SY,LD,HR,NP,CF,SA (Wales)
BT (N.Ireland)
The input to the regex may be the whole postcode, or it might just be the postcode area.
Can anyone help me create a regular expression that will match only if a given postcode is English?
EDIT - Solution
With help from several posters I was able to create the following regex which i've tested against over 1500 testcases successfully.
^(AL|B|B[ABDHLNRS]|C[ABHMORTVW]|D[AEHLNTY]|E|E[CNX]|FY|G[LUY]|H[ADGPUX]|I[GM‌​P]‌​
|JE|KT|L|L[AENSU]|M|ME|N|N[EGNRW]|O[LX]|P[ELOR]|R[GHM]|S|S[EGKLMNOPRSTW]|T[AFNQ‌​‌​
RSW]|UB|W|W[ACDFNRSV]|YO)\d{1,2}\s?(\d[\w]{2})?

I've already answered once, making the point that it's not possible to come up with a 100% correct England-only regex (since the postcode areas don't lie along political boundaries).
However I've dug a bit deeper into this, and ... well it is possible, but it's a lot of work.
To verify an England-only postcode, you need to exclude the non-English postcodes. The easy ones are:
BT (Northern Ireland)
IM (Isle of Man)
JE (Jersey)
GG (Guernsey)
BF (British Forces)
BX (non-geographic UK postcodes)
GIR (Girobank, which is also non-geographic)
(I'm not going to mention UK-style postcodes for territories outside the UK, like St Helena, Gibraltar etc. Technically speaking, the Isle of Man and Channel Islands aren't part of the UK either, but they're much nearer by, and more closely tied into the Royal Mail system in the UK.)
The purely Scottish postcode areas are (as you mentioned):
ZE,KW,IV,HS,PH,AB,DD,PA,FK,G,KY,KA,EH,ML
DG and TD are nominally Scottish, and are for the most part in Scotland. However some areas extend over the Scotland-England border as follows:
DG16 - a tiny bit in England
TD9 - a tiny bit in England
TD12 - half in England
TD15 - mostly in England
The breakdown is as follows:
DG16 is in Scotland except for the following English postcodes:
DG16 5H[TUZ]
DG16 5J[AB]
TD9 is in Scotland except for TD9 0T[JPRSTUW]
TD12 has only one sector (TD12 4), which is spread roughly half and half across England and Scotland:
TD12 4[ABDEHJLN] are in Scotland
TD12 4[QRSTUWX] are in England
TD15 is the most complicated. There are 3 sectors, of which TD15 2 and TD15 9 are entirely in England.
TD15 1 is split across England and Scotland.
Postcodes beginning as follows are in Scotland:
TD15 1T
TD15 1X
... except for these English postcodes:
TD15 1T[ABQUX]
TD15 1XX
All other postcodes in TD15 1 are in England, except for those beginning as follows:
TD15 1B
TD15 1S (i.e. TD15 1S[ABEJLNPWXY])
TD15 1U (i.e. TD15 1U[BDENPQRTUXY])
... which are all in England, with the exception of the following postcodes which are in Scotland:
TD15 1BT
TD15 1S[UZ]
TD15 1U[FGHJLSZ]
The English postcode areas CA and NE lie on the other side of the England-Scotland border, however they never extend into Scotland.
In fact, the last two letters of a UK postcode is based on how the postman actually delivers post (as far as I'm aware), so it's not given for granted that it will fall inside a political boundary. Thus if there's a group of houses which straddle the border, then it's possible that the entire postcode (i.e. at the most fine-grained level) does not lie entirely within either England or Scotland. E.g. TD9 0TJ and TD15 1UZ are very close to the border, and I don't really know for sure if they're entirely on one side or not.
The England-Wales border is also complicated, however I'll leave this as an exercise for the reader.

There are 124 Postcode Areas in the UK.
-- PAF®
statistics August 2012, via List of postcodes in the United Kingdom (Wikipedia).
I recommend breaking your problem down into two parts (think functions):
Is the postcode valid?
UK Postcode Regex (Comprehensive)
Is the postcode English?
This can be broken down further:
Not Scottish:
! /^(ZE|KW|IV|HS|PH|AB|DD|PA|FK|G|KY|KA|DG|TD|EH|ML)[0-9]/
Not Welsh:
! /^(LL|SY|LD|HR|NP|CF|SA)[0-9]/
Not Northern Irish, Manx, from the Channel Islands, ...
et cetera...
or you could just check that the Postcode Area is among the hundred or so English ones, depending on how you want to optimise ☻
Note that the syntax will vary according to your programming language.
Doing all this in one regular expression would soon become unmanageable.

It's not possible to come up with an England-only regex, because the postcode areas don't lie along political boundaries, at least not at the postcode area or district level.
For example, CH1 is in England, and CH5 is in Wales.
At the postcode district level there are still problems, for example TD12 is half in England, half in Scotland.
The only area which you can rely on is BT (Northern Ireland)

Use ^(AB|AL|B| ... )$, where the ... is where you fill the rest of the valid ones in, separated by pipes (|).
EDIT: There's a boatload of information here: http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom
If you were to include the in/out codes, it would be something like ^(AB|AL|B| ... )([\d\w]{3})\s([\d\w]{3})$, which would get the rest of the code.
EDIT
^(A[BL]|B[ABDHLNRST]?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[CNX]?|F[KY]|G[LUY]|H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EL]?|N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTWY]?|T[AFNQRSW]|UB|W[ACDFNRSV]?|YO|ZE)([\w\d]{1,2})\s?([\w\d]{3})$
Part of this regex is taken from another one of the answers. It matches the valid postcodes, then 1 to 2 {1,2} letters \w or numbers \d, an optional space \s?, then 3 letters or numbers. Hope that helps.

These are the RegEx i put together that follows the Royal Mail defined standards for all UK postcode types:
Standard UK PostCodes:
/^([A-PR-UWYZ](?:[0-9]{1,2}|[0-9][A-HJKMNPR-Y]|[A-HK-Y][0-9]{1,2}|[A-HK-Y][0-9][ABEHMNPRVWXY]))\s*([0-9][ABD-HJLNP-UW-Z]{2})$/i
GiroBank PostCodes:
/^(GIR)\s*(0AA)$/i
UK Overseas Territories:
/^([A-Z]{4})\s*(1ZZ)$/i
British Forces Post Office:
/^(BFPO)\s*(?:(c\/o)\s*)?((?(2)[0-9]{1,3}|[0-9]{1,4}))$/i
And this is the function I wrote which validates a postcode against these four types and allows type detection:
public function UKPostCode(&$strPostCode, &$strError = null, &$strType = null, $ReturnFormatted = true) {
$strStrippedPostCode = preg_replace("/[\s\-]/i", "", $strPostCode);
if (empty($strStrippedPostCode)) {
$strError = $this->__getErrorMessage("Post", "EMPTY_POST");
return false;
}
$arrRegExp = array(
"STD" => "/^([A-PR-UWYZ](?:[0-9]{1,2}|[0-9][A-HJKMNPR-Y]|[A-HK-Y][0-9]{1,2}|[A-HK-Y][0-9][ABEHMNPRVWXY]))\s*([0-9][ABD-HJLNP-UW-Z]{2})$/i",
"GIR" => "/^(GIR)\s*(0AA)$/i",
"OST" => "/^([A-Z]{4})\s*(1ZZ)$/i",
"BFPO" => "/^(BFPO)\s*(?:(c\/o)\s*)?((?(2)[0-9]{1,3}|[0-9]{1,4}))$/i"
);
foreach ($arrRegExp as $strPostCodeType => $strExpression) {
if (preg_match($strExpression, $strPostCode, $arrMatches)) {
if ($ReturnFormatted !== null) {
array_shift($arrMatches);
$strPostCode = implode(" ", array_filter($arrMatches));
$strPostCode = ((bool)$ReturnFormatted === true) ? strtoupper($strPostCode) : strtolower($strPostCode);
}
$strType = $strPostCodeType;
return true;
}
}
$strError = $this->__getErrorMessage("Post", "INVALID_POST");
return false;
}
Hope this helps

'A[BL]|B[ABDHLNRST]?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[CNX]?|F[KY]|G[LUY]|H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EL]?|N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTWY]?|T[AFNQRSW]|UB|W[ACDFNRSV]?|YO|ZE'

Related

How can a regex catch all parts before a keyword from a finite set, but sometimes separated only by a single space

This question relates to PCRE regular expressions.
Part of my big dataset are address data like this:
12H MARKET ST. Canada
123 SW 4TH Street USA
ONE HOUSE USA
1234 Quantity Dr USA
123 Quality Court Canada
1234 W HWY 56A USA
12345 BERNARDO CNTR DRIVE Canada
12 VILLAGE PLAZA USA
1234 WEST SAND LAKE RD ?567 USA
1234 TELEGRAM BLVD SUITE D USA
1234-A SOUTHWEST FRWY USA
123 CHURCH STREET USA
123 S WASHINGTON USA
123 NW-SE BLVD USA
# USA
1234 E MAIN STREET USA
I would like to extract the street names including house numbers and additional information from these records. (Of course there are other things in those records and I already know how to extract them).
For the purpose of this question I just manually clipped the interesting part from the data for this example.
The number of words in the address parts is not known before. The only criterion I have found so far is to find the occurrence of country names belonging to some finite set, which of course is bigger than (USA|Canada). For brevity I limit my example just to those two countries.
This regular expression
([a-zA-Z0-9?\-#.]+\s)
already isolates the words making up what I am after, including one space after them. Unfortunately there are cases, where the country after the to-be-extracted street information is only separated by a single space from the country, like e.g. in the first and in the last example.
Since I want to capture the matching parts glued together, I place a + sign behind my regular expression:
([a-zA-Z0-9?\-#.]+\s)+
but then in the two nasty cases with only one separating space before the country, the country is also caught!
Since I know the possible countries from looking at the data, I could try to exclude them by a look ahead-condition like this:
([a-zA-Z0-9?\-#.]+\s)(?!USA|Canada)
which excludes ST. from the match in the first line and STREET from the match in the last line. Of course the single capture groups are not yet glued together by this.
So I would add a plus sign to the group on the left:
([a-zA-Z0-9?\-#.]+\s)+(?!USA|Canada)
But then ST. and STREET and the Country, separated by only a single space, are caught again together with the country, which I want to exclude from my result!
How would you proceed in such a case?
If it would be possible by properly using regular expressions to replace each country name by the same one preceded by an additional space (or even to do this only for cases, where there is only a single space in front of one of the country-names), my problem would be solved. But I want to avoid such a substitution for the whole database in a separate run because a country name might appear in some other column too.
I am quite new to regular expressions and I have no idea how to do two processing steps onto the same input in sequence. - But maybe, someone has a better idea how to cope with this problem.
If I understand correctly, you want all content before the country (excluding spaces before the country). The country will always be present at the end of the line and comes from a list.
So you should be able to set the 'global' and 'multiline' options and then use the following regex:
^(.*?)(?=\s+(USA|Canada)\s*$)
Explanation:
^(.*) match all characters from start of line
(?=\s+(USA|Canada)\s*$) look ahead for one or more spaces, followed by one of the country names, followed by zero or more spaces and end of line.
That should give you a list with all addresses.
Edit:
I have changed the first part to: (.*?), making it non-greedy. That way the match will stop at the last letter before country instead of including some spaces.

Regex (Posix) to get first word only, not including numbers

New to Regex (which was recently added to SQL in DB2 for i). I don't know anything about the different engines but research indicates that it is "based on POSIX extended regular expressions".
I would like to get the street name (first non-numeric word) from an address.
e.g.
101 Main Street = Main
2/b Pleasant Ave = Pleasant
5H Unpleasant Crescent = Unpleasant
I'm sorry I don't have a string that isn't working, as suggested by the forum software. I don't even know where to start. I tried a few things I found in search but they either yielded nothing or the first "word" - i.e. the number (101, 2/b, 5H).
Thanks
Edit: Although it's looking as if IBM's implementation of regex on the DB2 family of databases may be too alien for many of the resident experts, I'll press ahead with some more detail in case it helps.
A plain English statement of the requirement would be:
Basic/acceptable: Find the first word/unbroken string that contains no numbers or special characters
Advanced/ideal: Find the first word that contains three or more characters, being only letters and zero or one embedded dash/hyphen, but no numbers or other characters.
Additional examples (original ones at top are still valid)
190 - 192 Tweety-bird avenue = Tweety-bird
190-192 Tweety-bird avenue = Tweety-bird
Charles Bronson Place = Charles
190H Charles-Bronson Place = Charles-Bronson
190 to 192 Charles Bronson Place = Charles
Second Edit:
Mooching around on the internet and trying every vaguely connected expression that I could find, I stumbled on this one:
[a-zA-Z]+(?:[\s-][a-zA-Z]+)*
which actually works pretty well - it gives the street name and street type, which on reflection would actually suit my purpose as well as the street name alone (I can easily expand common abbreviations - e.g. RD to ROAD - on the fly).
Sample SQL:
select HAD1,
regexp_substr(HAD1, '[a-zA-Z]+(?:[\s-][a-zA-Z]+)*')
from ECH
where HEDTE > 20190601
Sample output
Ship To REGEXP_SUBSTR
Address
Line 1
32 CHRISTOPHER STREET CHRISTOPHER STREET
250 - 270 FEATHERSTON STREET FEATHERSTON STREET
118 MONTREAL STREET MONTREAL STREET
7 BIRMINGHAM STREET BIRMINGHAM STREET
59 MORRISON DRIVE MORRISON DRIVE
118 MONTREAL STREET MONTREAL STREET
MASON ROAD MASON ROAD
I know this wasn't exactly the question I asked, so apologies to anyone who could have done this but was following the original request faithfully.
Not sure if this is Posix compliant, but something like this could work: ^[\w\/]+?\s((\w+\s)+?)\s*\w+?$, example here.
The script assumes that the first chunk is the number of the building, the second chunk, is the name of the street, and the last chunk is Road/Ave/Blvd/etc.
This should also cater for street names which have white spaces in them.
Using the following regex matches your examples :
(?<=[^ ]+ )[^ ]*[ ]

Regular expressions inside Angular Validators [duplicate]

I'm after a regex that will validate a full complex UK postcode only within an input string. All of the uncommon postcode forms must be covered as well as the usual. For instance:
Matches
CW3 9SS
SE5 0EG
SE50EG
se5 0eg
WC2H 7LT
No Match
aWC2H 7LT
WC2H 7LTa
WC2H
How do I solve this problem?
I'd recommend taking a look at the UK Government Data Standard for postcodes [link now dead; archive of XML, see Wikipedia for discussion]. There is a brief description about the data and the attached xml schema provides a regular expression. It may not be exactly what you want but would be a good starting point. The RegEx differs from the XML slightly, as a P character in third position in format A9A 9AA is allowed by the definition given.
The RegEx supplied by the UK Government was:
([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\s?[0-9][A-Za-z]{2})
As pointed out on the Wikipedia discussion, this will allow some non-real postcodes (e.g. those starting AA, ZY) and they do provide a more rigorous test that you could try.
I recently posted an answer to this question on UK postcodes for the R language. I discovered that the UK Government's regex pattern is incorrect and fails to properly validate some postcodes. Unfortunately, many of the answers here are based on this incorrect pattern.
I'll outline some of these issues below and provide a revised regular expression that actually works.
Note
My answer (and regular expressions in general):
Only validates postcode formats.
Does not ensure that a postcode legitimately exists.
For this, use an appropriate API! See Ben's answer for more info.
If you don't care about the bad regex and just want to skip to the answer, scroll down to the Answer section.
The Bad Regex
The regular expressions in this section should not be used.
This is the failing regex that the UK government has provided developers (not sure how long this link will be up, but you can see it in their Bulk Data Transfer documentation):
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))))[0-9][A-Za-z]{2})$
Problems
Problem 1 - Copy/Paste
See regex in use here.
As many developers likely do, they copy/paste code (especially regular expressions) and paste them expecting them to work. While this is great in theory, it fails in this particular case because copy/pasting from this document actually changes one of the characters (a space) into a newline character as shown below:
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))))
[0-9][A-Za-z]{2})$
The first thing most developers will do is just erase the newline without thinking twice. Now the regex won't match postcodes with spaces in them (other than the GIR 0AA postcode).
To fix this issue, the newline character should be replaced with the space character:
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
^
Problem 2 - Boundaries
See regex in use here.
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
^^ ^ ^ ^^
The postcode regex improperly anchors the regex. Anyone using this regex to validate postcodes might be surprised if a value like fooA11 1AA gets through. That's because they've anchored the start of the first option and the end of the second option (independently of one another), as pointed out in the regex above.
What this means is that ^ (asserts position at start of the line) only works on the first option ([Gg][Ii][Rr] 0[Aa]{2}), so the second option will validate any strings that end in a postcode (regardless of what comes before).
Similarly, the first option isn't anchored to the end of the line $, so GIR 0AAfoo is also accepted.
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))))[0-9][A-Za-z]{2})$
To fix this issue, both options should be wrapped in another group (or non-capturing group) and the anchors placed around that:
^(([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2}))$
^^ ^^
Problem 3 - Improper Character Set
See regex in use here.
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
^^
The regex is missing a - here to indicate a range of characters. As it stands, if a postcode is in the format ANA NAA (where A represents a letter and N represents a number), and it begins with anything other than A or Z, it will fail.
That means it will match A1A 1AA and Z1A 1AA, but not B1A 1AA.
To fix this issue, the character - should be placed between the A and Z in the respective character set:
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
^
Problem 4 - Wrong Optional Character Set
See regex in use here.
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
^
I swear they didn't even test this thing before publicizing it on the web. They made the wrong character set optional. They made [0-9] option in the fourth sub-option of option 2 (group 9). This allows the regex to match incorrectly formatted postcodes like AAA 1AA.
To fix this issue, make the next character class optional instead (and subsequently make the set [0-9] match exactly once):
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?)))) [0-9][A-Za-z]{2})$
^
Problem 5 - Performance
Performance on this regex is extremely poor. First off, they placed the least likely pattern option to match GIR 0AA at the beginning. How many users will likely have this postcode versus any other postcode; probably never? This means every time the regex is used, it must exhaust this option first before proceeding to the next option. To see how performance is impacted check the number of steps the original regex took (35) against the same regex after having flipped the options (22).
The second issue with performance is due to the way the entire regex is structured. There's no point backtracking over each option if one fails. The way the current regex is structured can greatly be simplified. I provide a fix for this in the Answer section.
Problem 6 - Spaces
See regex in use here
This may not be considered a problem, per se, but it does raise concern for most developers. The spaces in the regex are not optional, which means the users inputting their postcodes must place a space in the postcode. This is an easy fix by simply adding ? after the spaces to render them optional. See the Answer section for a fix.
Answer
1. Fixing the UK Government's Regex
Fixing all the issues outlined in the Problems section and simplifying the pattern yields the following, shorter, more concise pattern. We can also remove most of the groups since we're validating the postcode as a whole (not individual parts):
See regex in use here
^([A-Za-z][A-Ha-hJ-Yj-y]?[0-9][A-Za-z0-9]? ?[0-9][A-Za-z]{2}|[Gg][Ii][Rr] ?0[Aa]{2})$
This can further be shortened by removing all of the ranges from one of the cases (upper or lower case) and using a case-insensitive flag. Note: Some languages don't have one, so use the longer one above. Each language implements the case-insensitivity flag differently.
See regex in use here.
^([A-Z][A-HJ-Y]?[0-9][A-Z0-9]? ?[0-9][A-Z]{2}|GIR ?0A{2})$
Shorter again replacing [0-9] with \d (if your regex engine supports it):
See regex in use here.
^([A-Z][A-HJ-Y]?\d[A-Z\d]? ?\d[A-Z]{2}|GIR ?0A{2})$
2. Simplified Patterns
Without ensuring specific alphabetic characters, the following can be used (keep in mind the simplifications from 1. Fixing the UK Government's Regex have also been applied here):
See regex in use here.
^([A-Z]{1,2}\d[A-Z\d]? ?\d[A-Z]{2}|GIR ?0A{2})$
And even further if you don't care about the special case GIR 0AA:
^[A-Z]{1,2}\d[A-Z\d]? ?\d[A-Z]{2}$
3. Complicated Patterns
I would not suggest over-verification of a postcode as new Areas, Districts and Sub-districts may appear at any point in time. What I will suggest potentially doing, is added support for edge-cases. Some special cases exist and are outlined in this Wikipedia article.
Here are complex regexes that include the subsections of 3. (3.1, 3.2, 3.3).
In relation to the patterns in 1. Fixing the UK Government's Regex:
See regex in use here
^(([A-Z][A-HJ-Y]?\d[A-Z\d]?|ASCN|STHL|TDCU|BBND|[BFS]IQQ|PCRN|TKCA) ?\d[A-Z]{2}|BFPO ?\d{1,4}|(KY\d|MSR|VG|AI)[ -]?\d{4}|[A-Z]{2} ?\d{2}|GE ?CX|GIR ?0A{2}|SAN ?TA1)$
And in relation to 2. Simplified Patterns:
See regex in use here
^(([A-Z]{1,2}\d[A-Z\d]?|ASCN|STHL|TDCU|BBND|[BFS]IQQ|PCRN|TKCA) ?\d[A-Z]{2}|BFPO ?\d{1,4}|(KY\d|MSR|VG|AI)[ -]?\d{4}|[A-Z]{2} ?\d{2}|GE ?CX|GIR ?0A{2}|SAN ?TA1)$
3.1 British Overseas Territories
The Wikipedia article currently states (some formats slightly simplified):
AI-1111: Anguila
ASCN 1ZZ: Ascension Island
STHL 1ZZ: Saint Helena
TDCU 1ZZ: Tristan da Cunha
BBND 1ZZ: British Indian Ocean Territory
BIQQ 1ZZ: British Antarctic Territory
FIQQ 1ZZ: Falkland Islands
GX11 1ZZ: Gibraltar
PCRN 1ZZ: Pitcairn Islands
SIQQ 1ZZ: South Georgia and the South Sandwich Islands
TKCA 1ZZ: Turks and Caicos Islands
BFPO 11: Akrotiri and Dhekelia
ZZ 11 & GE CX: Bermuda (according to this document)
KY1-1111: Cayman Islands (according to this document)
VG1111: British Virgin Islands (according to this document)
MSR 1111: Montserrat (according to this document)
An all-encompassing regex to match only the British Overseas Territories might look like this:
See regex in use here.
^((ASCN|STHL|TDCU|BBND|[BFS]IQQ|GX\d{2}|PCRN|TKCA) ?\d[A-Z]{2}|(KY\d|MSR|VG|AI)[ -]?\d{4}|(BFPO|[A-Z]{2}) ?\d{2}|GE ?CX)$
3.2 British Forces Post Office
Although they've been recently changed it to better align with the British postcode system to BF# (where # represents a number), they're considered optional alternative postcodes. These postcodes follow(ed) the format of BFPO, followed by 1-4 digits:
See regex in use here
^BFPO ?\d{1,4}$
3.3 Santa?
There's another special case with Santa (as mentioned in other answers): SAN TA1 is a valid postcode. A regex for this is very simply:
^SAN ?TA1$
It looks like we're going to be using ^(GIR ?0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?[0-9][ABD-HJLNP-UW-Z]{2})$, which is a slightly modified version of that sugested by Minglis above.
However, we're going to have to investigate exactly what the rules are, as the various solutions listed above appear to apply different rules as to which letters are allowed.
After some research, we've found some more information. Apparently a page on 'govtalk.gov.uk' points you to a postcode specification govtalk-postcodes. This points to an XML schema at XML Schema which provides a 'pseudo regex' statement of the postcode rules.
We've taken that and worked on it a little to give us the following expression:
^((GIR &0AA)|((([A-PR-UWYZ][A-HK-Y]?[0-9][0-9]?)|(([A-PR-UWYZ][0-9][A-HJKSTUW])|([A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRV-Y]))) &[0-9][ABD-HJLNP-UW-Z]{2}))$
This makes spaces optional, but does limit you to one space (replace the '&' with '{0,} for unlimited spaces). It assumes all text must be upper-case.
If you want to allow lower case, with any number of spaces, use:
^(([gG][iI][rR] {0,}0[aA]{2})|((([a-pr-uwyzA-PR-UWYZ][a-hk-yA-HK-Y]?[0-9][0-9]?)|(([a-pr-uwyzA-PR-UWYZ][0-9][a-hjkstuwA-HJKSTUW])|([a-pr-uwyzA-PR-UWYZ][a-hk-yA-HK-Y][0-9][abehmnprv-yABEHMNPRV-Y]))) {0,}[0-9][abd-hjlnp-uw-zABD-HJLNP-UW-Z]{2}))$
This doesn't cover overseas territories and only enforces the format, NOT the existence of different areas. It is based on the following rules:
Can accept the following formats:
“GIR 0AA”
A9 9ZZ
A99 9ZZ
AB9 9ZZ
AB99 9ZZ
A9C 9ZZ
AD9E 9ZZ
Where:
9 can be any single digit number.
A can be any letter except for Q, V or X.
B can be any letter except for I, J or Z.
C can be any letter except for I, L, M, N, O, P, Q, R, V, X, Y or Z.
D can be any letter except for I, J or Z.
E can be any of A, B, E, H, M, N, P, R, V, W, X or Y.
Z can be any letter except for C, I, K, M, O or V.
Best wishes
Colin
There is no such thing as a comprehensive UK postcode regular expression that is capable of validating a postcode. You can check that a postcode is in the correct format using a regular expression; not that it actually exists.
Postcodes are arbitrarily complex and constantly changing. For instance, the outcode W1 does not, and may never, have every number between 1 and 99, for every postcode area.
You can't expect what is there currently to be true forever. As an example, in 1990, the Post Office decided that Aberdeen was getting a bit crowded. They added a 0 to the end of AB1-5 making it AB10-50 and then created a number of postcodes in between these.
Whenever a new street is build a new postcode is created. It's part of the process for obtaining permission to build; local authorities are obliged to keep this updated with the Post Office (not that they all do).
Furthermore, as noted by a number of other users, there's the special postcodes such as Girobank, GIR 0AA, and the one for letters to Santa, SAN TA1 - you probably don't want to post anything there but it doesn't appear to be covered by any other answer.
Then, there's the BFPO postcodes, which are now changing to a more standard format. Both formats are going to be valid. Lastly, there's the overseas territories source Wikipedia.
+----------+----------------------------------------------+
| Postcode | Location |
+----------+----------------------------------------------+
| AI-2640 | Anguilla |
| ASCN 1ZZ | Ascension Island |
| STHL 1ZZ | Saint Helena |
| TDCU 1ZZ | Tristan da Cunha |
| BBND 1ZZ | British Indian Ocean Territory |
| BIQQ 1ZZ | British Antarctic Territory |
| FIQQ 1ZZ | Falkland Islands |
| GX11 1AA | Gibraltar |
| PCRN 1ZZ | Pitcairn Islands |
| SIQQ 1ZZ | South Georgia and the South Sandwich Islands |
| TKCA 1ZZ | Turks and Caicos Islands |
+----------+----------------------------------------------+
Next, you have to take into account that the UK "exported" its postcode system to many places in the world. Anything that validates a "UK" postcode will also validate the postcodes of a number of other countries.
If you want to validate a UK postcode the safest way to do it is to use a look-up of current postcodes. There are a number of options:
Ordnance Survey releases Code-Point Open under an open data licence. It'll be very slightly behind the times but it's free. This will (probably - I can't remember) not include Northern Irish data as the Ordnance Survey has no remit there. Mapping in Northern Ireland is conducted by the Ordnance Survey of Northern Ireland and they have their, separate, paid-for, Pointer product. You could use this and append the few that aren't covered fairly easily.
Royal Mail releases the Postcode Address File (PAF), this includes BFPO which I'm not sure Code-Point Open does. It's updated regularly but costs money (and they can be downright mean about it sometimes). PAF includes the full address rather than just postcodes and comes with its own Programmers Guide. The Open Data User Group (ODUG) is currently lobbying to have PAF released for free, here's a description of their position.
Lastly, there's AddressBase. This is a collaboration between Ordnance Survey, Local Authorities, Royal Mail and a matching company to create a definitive directory of all information about all UK addresses (they've been fairly successful as well). It's paid-for but if you're working with a Local Authority, government department, or government service it's free for them to use. There's a lot more information than just postcodes included.
^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$
Regular expression to match valid UK
postcodes. In the UK postal system not
all letters are used in all positions
(the same with vehicle registration
plates) and there are various rules to
govern this. This regex takes into
account those rules. Details of the
rules: First half of postcode Valid
formats [A-Z][A-Z][0-9][A-Z]
[A-Z][A-Z][0-9][0-9] [A-Z][0-9][0-9]
[A-Z][A-Z][0-9] [A-Z][A-Z][A-Z]
[A-Z][0-9][A-Z] [A-Z][0-9] Exceptions
Position - First. Contraint - QVX not
used Position - Second. Contraint -
IJZ not used except in GIR 0AA
Position - Third. Constraint -
AEHMNPRTVXY only used Position -
Forth. Contraint - ABEHMNPRVWXY Second
half of postcode Valid formats
[0-9][A-Z][A-Z] Exceptions Position -
Second and Third. Contraint - CIKMOV
not used
http://regexlib.com/REDetails.aspx?regexp_id=260
I had a look into some of the answers above and I'd recommend against using the pattern from #Dan's answer (c. Dec 15 '10), since it incorrectly flags almost 0.4% of valid postcodes as invalid, while the others do not.
Ordnance Survey provide service called Code Point Open which:
contains a list of all the current postcode units in Great Britain
I ran each of the regexs above against the full list of postcodes (Jul 6 '13) from this data using grep:
cat CSV/*.csv |
# Strip leading quotes
sed -e 's/^"//g' |
# Strip trailing quote and everything after it
sed -e 's/".*//g' |
# Strip any spaces
sed -E -e 's/ +//g' |
# Find any lines that do not match the expression
grep --invert-match --perl-regexp "$pattern"
There are 1,686,202 postcodes total.
The following are the numbers of valid postcodes that do not match each $pattern:
'^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]?[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$'
# => 6016 (0.36%)
'^(GIR ?0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?[0-9][ABD-HJLNP-UW-Z]{2})$'
# => 0
'^GIR[ ]?0AA|((AB|AL|B|BA|BB|BD|BH|BL|BN|BR|BS|BT|BX|CA|CB|CF|CH|CM|CO|CR|CT|CV|CW|DA|DD|DE|DG|DH|DL|DN|DT|DY|E|EC|EH|EN|EX|FK|FY|G|GL|GY|GU|HA|HD|HG|HP|HR|HS|HU|HX|IG|IM|IP|IV|JE|KA|KT|KW|KY|L|LA|LD|LE|LL|LN|LS|LU|M|ME|MK|ML|N|NE|NG|NN|NP|NR|NW|OL|OX|PA|PE|PH|PL|PO|PR|RG|RH|RM|S|SA|SE|SG|SK|SL|SM|SN|SO|SP|SR|SS|ST|SW|SY|TA|TD|TF|TN|TQ|TR|TS|TW|UB|W|WA|WC|WD|WF|WN|WR|WS|WV|YO|ZE)(\d[\dA-Z]?[ ]?\d[ABD-HJLN-UW-Z]{2}))|BFPO[ ]?\d{1,4}$'
# => 0
Of course, these results only deal with valid postcodes that are incorrectly flagged as invalid. So:
'^.*$'
# => 0
I'm saying nothing about which pattern is the best regarding filtering out invalid postcodes.
According to this Wikipedia table
This pattern cover all the cases
(?:[A-Za-z]\d ?\d[A-Za-z]{2})|(?:[A-Za-z][A-Za-z\d]\d ?\d[A-Za-z]{2})|(?:[A-Za-z]{2}\d{2} ?\d[A-Za-z]{2})|(?:[A-Za-z]\d[A-Za-z] ?\d[A-Za-z]{2})|(?:[A-Za-z]{2}\d[A-Za-z] ?\d[A-Za-z]{2})
When using it on Android\Java use \\d
Most of the answers here didn't work for all the postcodes I have in my database. I finally found one that validates with all, using the new regex provided by the government:
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/413338/Bulk_Data_Transfer_-_additional_validation_valid_from_March_2015.pdf
It isn't in any of the previous answers so I post it here in case they take the link down:
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
UPDATE: Updated regex as pointed by Jamie Bull. Not sure if it was my error copying or it was an error in the government's regex, the link is down now...
UPDATE: As ctwheels found, this regex works with the javascript regex flavor. See his comment for one that works with the pcre (php) flavor.
An old post but still pretty high in google results so thought I'd update. This Oct 14 doc defines the UK postcode regular expression as:
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([**AZ**a-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
from:
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/359448/4__Bulk_Data_Transfer_-_additional_validation_valid.pdf
The document also explains the logic behind it. However, it has an error (bolded) and also allows lower case, which although legal is not usual, so amended version:
^(GIR 0AA)|((([A-Z][0-9]{1,2})|(([A-Z][A-HJ-Y][0-9]{1,2})|(([A-Z][0-9][A-Z])|([A-Z][A-HJ-Y][0-9]?[A-Z])))) [0-9][A-Z]{2})$
This works with new London postcodes (e.g. W1D 5LH) that previous versions did not.
This is the regex Google serves on their i18napis.appspot.com domain:
GIR[ ]?0AA|((AB|AL|B|BA|BB|BD|BH|BL|BN|BR|BS|BT|BX|CA|CB|CF|CH|CM|CO|CR|CT|CV|CW|DA|DD|DE|DG|DH|DL|DN|DT|DY|E|EC|EH|EN|EX|FK|FY|G|GL|GY|GU|HA|HD|HG|HP|HR|HS|HU|HX|IG|IM|IP|IV|JE|KA|KT|KW|KY|L|LA|LD|LE|LL|LN|LS|LU|M|ME|MK|ML|N|NE|NG|NN|NP|NR|NW|OL|OX|PA|PE|PH|PL|PO|PR|RG|RH|RM|S|SA|SE|SG|SK|SL|SM|SN|SO|SP|SR|SS|ST|SW|SY|TA|TD|TF|TN|TQ|TR|TS|TW|UB|W|WA|WC|WD|WF|WN|WR|WS|WV|YO|ZE)(\d[\dA-Z]?[ ]?\d[ABD-HJLN-UW-Z]{2}))|BFPO[ ]?\d{1,4}
Postcodes are subject to change, and the only true way of validating a postcode is to have the complete list of postcodes and see if it's there.
But regular expressions are useful because they:
are easy to use and implement
are short
are quick to run
are quite easy to maintain (compared to a full list of postcodes)
still catch most input errors
But regular expressions tend to be difficult to maintain, especially for someone who didn't come up with it in the first place. So it must be:
as easy to understand as possible
relatively future proof
That means that most of the regular expressions in this answer aren't good enough. E.g. I can see that [A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRV-Y] is going to match a postcode area of the form AA1A — but it's going to be a pain in the neck if and when a new postcode area gets added, because it's difficult to understand which postcode areas it matches.
I also want my regular expression to match the first and second half of the postcode as parenthesised matches.
So I've come up with this:
(GIR(?=\s*0AA)|(?:[BEGLMNSW]|[A-Z]{2})[0-9](?:[0-9]|(?<=N1|E1|SE1|SW1|W1|NW1|EC[0-9]|WC[0-9])[A-HJ-NP-Z])?)\s*([0-9][ABD-HJLNP-UW-Z]{2})
In PCRE format it can be written as follows:
/^
( GIR(?=\s*0AA) # Match the special postcode "GIR 0AA"
|
(?:
[BEGLMNSW] | # There are 8 single-letter postcode areas
[A-Z]{2} # All other postcode areas have two letters
)
[0-9] # There is always at least one number after the postcode area
(?:
[0-9] # And an optional extra number
|
# Only certain postcode areas can have an extra letter after the number
(?<=N1|E1|SE1|SW1|W1|NW1|EC[0-9]|WC[0-9])
[A-HJ-NP-Z] # Possible letters here may change, but [IO] will never be used
)?
)
\s*
([0-9][ABD-HJLNP-UW-Z]{2}) # The last two letters cannot be [CIKMOV]
$/x
For me this is the right balance between validating as much as possible, while at the same time future-proofing and allowing for easy maintenance.
I've been looking for a UK postcode regex for the last day or so and stumbled on this thread. I worked my way through most of the suggestions above and none of them worked for me so I came up with my own regex which, as far as I know, captures all valid UK postcodes as of Jan '13 (according to the latest literature from the Royal Mail).
The regex and some simple postcode checking PHP code is posted below. NOTE:- It allows for lower or uppercase postcodes and the GIR 0AA anomaly but to deal with the, more than likely, presence of a space in the middle of an entered postcode it also makes use of a simple str_replace to remove the space before testing against the regex. Any discrepancies beyond that and the Royal Mail themselves don't even mention them in their literature (see http://www.royalmail.com/sites/default/files/docs/pdf/programmers_guide_edition_7_v5.pdf and start reading from page 17)!
Note: In the Royal Mail's own literature (link above) there is a slight ambiguity surrounding the 3rd and 4th positions and the exceptions in place if these characters are letters. I contacted Royal Mail directly to clear it up and in their own words "A letter in the 4th position of the Outward Code with the format AANA NAA has no exceptions and the 3rd position exceptions apply only to the last letter of the Outward Code with the format ANA NAA." Straight from the horse's mouth!
<?php
$postcoderegex = '/^([g][i][r][0][a][a])$|^((([a-pr-uwyz]{1}([0]|[1-9]\d?))|([a-pr-uwyz]{1}[a-hk-y]{1}([0]|[1-9]\d?))|([a-pr-uwyz]{1}[1-9][a-hjkps-uw]{1})|([a-pr-uwyz]{1}[a-hk-y]{1}[1-9][a-z]{1}))(\d[abd-hjlnp-uw-z]{2})?)$/i';
$postcode2check = str_replace(' ','',$postcode2check);
if (preg_match($postcoderegex, $postcode2check)) {
echo "$postcode2check is a valid postcode<br>";
} else {
echo "$postcode2check is not a valid postcode<br>";
}
?>
I hope it helps anyone else who comes across this thread looking for a solution.
Here's a regex based on the format specified in the documents which are linked to marcj's answer:
/^[A-Z]{1,2}[0-9][0-9A-Z]? ?[0-9][A-Z]{2}$/
The only difference between that and the specs is that the last 2 characters cannot be in [CIKMOV] according to the specs.
Edit:
Here's another version which does test for the trailing character limitations.
/^[A-Z]{1,2}[0-9][0-9A-Z]? ?[0-9][A-BD-HJLNP-UW-Z]{2}$/
Some of the regexs above are a little restrictive. Note the genuine postcode: "W1K 7AA" would fail given the rule "Position 3 - AEHMNPRTVXY only used" above as "K" would be disallowed.
the regex:
^(GIR 0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKPS-UW])[0-9][ABD-HJLNP-UW-Z]{2})$
Seems a little more accurate, see the Wikipedia article entitled 'Postcodes in the United Kingdom'.
Note that this regex requires uppercase only characters.
The bigger question is whether you are restricting user input to allow only postcodes that actually exist or whether you are simply trying to stop users entering complete rubbish into the form fields. Correctly matching every possible postcode, and future proofing it, is a harder puzzle, and probably not worth it unless you are HMRC.
I wanted a simple regex, where it's fine to allow too much, but not to deny a valid postcode. I went with this (the input is a stripped/trimmed string):
/^([a-z0-9]\s*){5,8}$/i
This allows the shortest possible postcodes like "L1 8JQ" as well as the longest ones like "OL14 5ET".
Because it allows up to 8 characters, it will also allow incorrect 8 character postcodes if there is no space: "OL145ETX". But again, this is a simplistic regex, for when that's good enough.
Whilst there are many answers here, I'm not happy with either of them. Most of them are simply broken, are too complex or just broken.
I looked at #ctwheels answer and I found it very explanatory and correct; we must thank him for that. However once again too much "data" for me, for something so simple.
Fortunately, I managed to get a database with over 1 million active postcodes for England only and made a small PowerShell script to test and benchmark the results.
UK Postcode specifications: Valid Postcode Format.
This is "my" Regex:
^([a-zA-Z]{1,2}[a-zA-Z\d]{1,2})\s(\d[a-zA-Z]{2})$
Short, simple and sweet. Even the most unexperienced can understand what is going on.
Explanation:
^ asserts position at start of a line
1st Capturing Group ([a-zA-Z]{1,2}[a-zA-Z\d]{1,2})
Match a single character present in the list below [a-zA-Z]
{1,2} matches the previous token between 1 and 2 times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Match a single character present in the list below [a-zA-Z\d]
{1,2} matches the previous token between 1 and 2 times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
\d matches a digit (equivalent to [0-9])
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
2nd Capturing Group (\d[a-zA-Z]{2})
\d matches a digit (equivalent to [0-9])
Match a single character present in the list below [a-zA-Z]
{2} matches the previous token exactly 2 times
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
$ asserts position at the end of a line
Result (postcodes checked):
TOTAL OK: 1469193
TOTAL FAILED: 0
-------------------------------------------------------------------------
Days : 0
Hours : 0
Minutes : 5
Seconds : 22
Milliseconds : 718
Ticks : 3227185939
TotalDays : 0.00373516891087963
TotalHours : 0.0896440538611111
TotalMinutes : 5.37864323166667
TotalSeconds : 322.7185939
TotalMilliseconds : 322718.5939
here's how we have been dealing with the UK postcode issue:
^([A-Za-z]{1,2}[0-9]{1,2}[A-Za-z]?[ ]?)([0-9]{1}[A-Za-z]{2})$
Explanation:
expect 1 or 2 a-z chars, upper or lower fine
expect 1 or 2 numbers
expect 0 or 1 a-z char, upper or lower fine
optional space allowed
expect 1 number
expect 2 a-z, upper or lower fine
This gets most formats, we then use the db to validate whether the postcode is actually real, this data is driven by openpoint https://www.ordnancesurvey.co.uk/opendatadownload/products.html
hope this helps
Basic rules:
^[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][ABD-HJLNP-UW-Z]{2}$
Postal codes in the U.K. (or postcodes, as they’re called) are composed of five to seven alphanumeric characters separated by a space. The rules covering which characters can appear at particular positions are rather complicated and fraught with exceptions. The regular expression just shown therefore sticks to the basic rules.
Complete rules:
If you need a regex that ticks all the boxes for the postcode rules at the expense of readability, here you go:
^(?:(?:[A-PR-UWYZ][0-9]{1,2}|[A-PR-UWYZ][A-HK-Y][0-9]{1,2}|[A-PR-UWYZ][0-9][A-HJKSTUW]|[A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRV-Y]) [0-9][ABD-HJLNP-UW-Z]{2}|GIR 0AA)$
Source: https://www.safaribooksonline.com/library/view/regular-expressions-cookbook/9781449327453/ch04s16.html
Tested against our customers database and seems perfectly accurate.
I use the following regex that I have tested against all valid UK postcodes. It is based on the recommended rules, but condensed as much as reasonable and does not make use of any special language specific regex rules.
([A-PR-UWYZ]([A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y])?|[0-9]([0-9]|[A-HJKPSTUW])?) ?[0-9][ABD-HJLNP-UW-Z]{2})
It assumes that the postcode has been converted to uppercase and has not leading or trailing characters, but will accept an optional space between the outcode and incode.
The special "GIR0 0AA" postcode is excluded and will not validate as it's not in the official Post Office list of postcodes and as far as I'm aware will not be used as registered address. Adding it should be trivial as a special case if required.
First half of postcode Valid formats
[A-Z][A-Z][0-9][A-Z]
[A-Z][A-Z][0-9][0-9]
[A-Z][0-9][0-9]
[A-Z][A-Z][0-9]
[A-Z][A-Z][A-Z]
[A-Z][0-9][A-Z]
[A-Z][0-9]
Exceptions
Position 1 - QVX not used
Position 2 - IJZ not used except in GIR 0AA
Position 3 - AEHMNPRTVXY only used
Position 4 - ABEHMNPRVWXY
Second half of postcode
[0-9][A-Z][A-Z]
Exceptions
Position 2+3 - CIKMOV not used
Remember not all possible codes are used, so this list is a necessary but not sufficent condition for a valid code. It might be easier to just match against a list of all valid codes?
To check a postcode is in a valid format as per the Royal Mail's programmer's guide:
|----------------------------outward code------------------------------| |------inward code-----|
#special↓ α1 α2 AAN AANA AANN AN ANN ANA (α3) N AA
^(GIR 0AA|[A-PR-UWYZ]([A-HK-Y]([0-9][A-Z]?|[1-9][0-9])|[1-9]([0-9]|[A-HJKPSTUW])?) [0-9][ABD-HJLNP-UW-Z]{2})$
All postcodes on doogal.co.uk match, except for those no longer in use.
Adding a ? after the space and using case-insensitive match to answer this question:
'se50eg'.match(/^(GIR 0AA|[A-PR-UWYZ]([A-HK-Y]([0-9][A-Z]?|[1-9][0-9])|[1-9]([0-9]|[A-HJKPSTUW])?) ?[0-9][ABD-HJLNP-UW-Z]{2})$/ig);
Array [ "se50eg" ]
This one allows empty spaces and tabs from both sides in case you don't want to fail validation and then trim it sever side.
^\s*(([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) {0,1}[0-9][A-Za-z]{2})\s*$)
Through empirical testing and observation, as well as confirming with https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation, here is my version of a Python regex that correctly parses and validates a UK postcode:
UK_POSTCODE_REGEX = r'(?P<postcode_area>[A-Z]{1,2})(?P<district>(?:[0-9]{1,2})|(?:[0-9][A-Z]))(?P<sector>[0-9])(?P<postcode>[A-Z]{2})'
This regex is simple and has capture groups. It does not include all of the validations of legal UK postcodes, but only takes into account the letter vs number positions.
Here is how I would use it in code:
#dataclass
class UKPostcode:
postcode_area: str
district: str
sector: int
postcode: str
# https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation
# Original author of this regex: #jontsai
# NOTE TO FUTURE DEVELOPER:
# Verified through empirical testing and observation, as well as confirming with the Wiki article
# If this regex fails to capture all valid UK postcodes, then I apologize, for I am only human.
UK_POSTCODE_REGEX = r'(?P<postcode_area>[A-Z]{1,2})(?P<district>(?:[0-9]{1,2})|(?:[0-9][A-Z]))(?P<sector>[0-9])(?P<postcode>[A-Z]{2})'
#classmethod
def from_postcode(cls, postcode):
"""Parses a string into a UKPostcode
Returns a UKPostcode or None
"""
m = re.match(cls.UK_POSTCODE_REGEX, postcode.replace(' ', ''))
if m:
uk_postcode = UKPostcode(
postcode_area=m.group('postcode_area'),
district=m.group('district'),
sector=m.group('sector'),
postcode=m.group('postcode')
)
else:
uk_postcode = None
return uk_postcode
def parse_uk_postcode(postcode):
"""Wrapper for UKPostcode.from_postcode
"""
uk_postcode = UKPostcode.from_postcode(postcode)
return uk_postcode
Here are unit tests:
#pytest.mark.parametrize(
'postcode, expected', [
# https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation
(
'EC1A1BB',
UKPostcode(
postcode_area='EC',
district='1A',
sector='1',
postcode='BB'
),
),
(
'W1A0AX',
UKPostcode(
postcode_area='W',
district='1A',
sector='0',
postcode='AX'
),
),
(
'M11AE',
UKPostcode(
postcode_area='M',
district='1',
sector='1',
postcode='AE'
),
),
(
'B338TH',
UKPostcode(
postcode_area='B',
district='33',
sector='8',
postcode='TH'
)
),
(
'CR26XH',
UKPostcode(
postcode_area='CR',
district='2',
sector='6',
postcode='XH'
)
),
(
'DN551PT',
UKPostcode(
postcode_area='DN',
district='55',
sector='1',
postcode='PT'
)
)
]
)
def test_parse_uk_postcode(postcode, expected):
uk_postcode = parse_uk_postcode(postcode)
assert(uk_postcode == expected)
To add to this list a more practical regex that I use that allows the user to enter an empty string is:
^$|^(([gG][iI][rR] {0,}0[aA]{2})|((([a-pr-uwyzA-PR-UWYZ][a-hk-yA-HK-Y]?[0-9][0-9]?)|(([a-pr-uwyzA-PR-UWYZ][0-9][a-hjkstuwA-HJKSTUW])|([a-pr-uwyzA-PR-UWYZ][a-hk-yA-HK-Y][0-9][abehmnprv-yABEHMNPRV-Y]))) {0,1}[0-9][abd-hjlnp-uw-zABD-HJLNP-UW-Z]{2}))$
This regex allows capital and lower case letters with an optional space in between
From a software developers point of view this regex is useful for software where an address may be optional. For example if a user did not want to supply their address details
Have a look at the python code on this page:
http://www.brunningonline.net/simon/blog/archives/001292.html
I've got some postcode parsing to do. The requirement is pretty simple; I have to parse a postcode into an outcode and (optional) incode. The good new is that I don't have to perform any validation - I just have to chop up what I've been provided with in a vaguely intelligent manner. I can't assume much about my import in terms of formatting, i.e. case and embedded spaces. But this isn't the bad news; the bad news is that I have to do it all in RPG. :-(
Nevertheless, I threw a little Python function together to clarify my thinking.
I've used it to process postcodes for me.
I have the regex for UK Postcode validation.
This is working for all type of Postcode either inner or outer
^((([A-PR-UWYZ][0-9])|([A-PR-UWYZ][0-9][0-9])|([A-PR-UWYZ][A-HK-Y][0-9])|([A-PR-UWYZ][A-HK-Y][0-9][0-9])|([A-PR-UWYZ][0-9][A-HJKSTUW])|([A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRVWXY]))) || ^((GIR)[ ]?(0AA))$|^(([A-PR-UWYZ][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$|^(([A-PR-UWYZ][0-9][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$|^(([A-PR-UWYZ][A-HK-Y0-9][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$|^(([A-PR-UWYZ][A-HK-Y0-9][0-9][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$|^(([A-PR-UWYZ][0-9][A-HJKS-UW0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$|^(([A-PR-UWYZ][A-HK-Y0-9][0-9][ABEHMNPRVWXY0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$
This is working for all type of format.
Example:
AB10-------------------->ONLY OUTER POSTCODE
A1 1AA------------------>COMBINATION OF (OUTER AND INNER) POSTCODE
WC2A-------------------->OUTER
We were given a spec:
UK postcodes must be in one of the following forms (with one exception, see below):
§ A9 9AA
§ A99 9AA
§ AA9 9AA
§ AA99 9AA
§ A9A 9AA
§ AA9A 9AA
where A represents an alphabetic character and 9 represents a numeric character.
Additional rules apply to alphabetic characters, as follows:
§ The character in position 1 may not be Q, V or X
§ The character in position 2 may not be I, J or Z
§ The character in position 3 may not be I, L, M, N, O, P, Q, R, V, X, Y or Z
§ The character in position 4 may not be C, D, F, G, I, J, K, L, O, Q, S, T, U or Z
§ The characters in the rightmost two positions may not be C, I, K, M, O or V
The one exception that does not follow these general rules is the postcode "GIR 0AA", which is a special valid postcode.
We came up with this:
/^([A-PR-UWYZ][A-HK-Y0-9](?:[A-HJKS-UW0-9][ABEHMNPRV-Y0-9]?)?\s*[0-9][ABD-HJLNP-UW-Z]{2}|GIR\s*0AA)$/i
But note - this allows any number of spaces in between groups.
The accepted answer reflects the rules given by Royal Mail, although there is a typo in the regex. This typo seems to have been in there on the gov.uk site as well (as it is in the XML archive page).
In the format A9A 9AA the rules allow a P character in the third position, whilst the regex disallows this. The correct regex would be:
(GIR 0AA)|((([A-Z-[QVX]][0-9][0-9]?)|(([A-Z-[QVX]][A-Z-[IJZ]][0-9][0-9]?)|(([A-Z-[QVX]][0-9][A-HJKPSTUW])|([A-Z-[QVX]][A-Z-[IJZ]][0-9][ABEHMNPRVWXY])))) [0-9][A-Z-[CIKMOV]]{2})
Shortening this results in the following regex (which uses Perl/Ruby syntax):
(GIR 0AA)|([A-PR-UWYZ](([0-9]([0-9A-HJKPSTUW])?)|([A-HK-Y][0-9]([0-9ABEHMNPRVWXY])?))\s?[0-9][ABD-HJLNP-UW-Z]{2})
It also includes an optional space between the first and second block.
What i have found in nearly all the variations and the regex from the bulk transfer pdf and what is on wikipedia site is this, specifically for the wikipedia regex is, there needs to be a ^ after the first |(vertical bar). I figured this out by testing for AA9A 9AA, because otherwise the format check for A9A 9AA will validate it. For Example checking for EC1D 1BB which should be invalid comes back valid because C1D 1BB is a valid format.
Here is what I've come up with for a good regex:
^([G][I][R] 0[A]{2})|^((([A-Z-[QVX]][0-9]{1,2})|([A-Z-[QVX]][A-HK-Y][0-9]{1,2})|([A-Z-[QVX]][0-9][ABCDEFGHJKPSTUW])|([A-Z-[QVX]][A-HK-Y][0-9][ABEHMNPRVWXY])) [0-9][A-Z-[CIKMOV]]{2})$
Below method will check the post code and provide complete info
const isValidUKPostcode = postcode => {
try {
postcode = postcode.replace(/\s/g, "");
const fromat = postcode
.toUpperCase()
.match(/^([A-Z]{1,2}\d{1,2}[A-Z]?)\s*(\d[A-Z]{2})$/);
const finalValue = `${fromat[1]} ${fromat[2]}`;
const regex = /^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))))[0-9][A-Za-z]{2})$/i;
return {
isValid: regex.test(postcode),
formatedPostCode: finalValue,
error: false,
message: 'It is a valid postcode'
};
} catch (error) {
return { error: true , message: 'Invalid postcode'};
}
};
console.log(isValidUKPostcode('GU348RR'))
{isValid: true, formattedPostcode: "GU34 8RR", error: false, message: "It is a valid postcode"}
console.log(isValidUKPostcode('sdasd4746asd'))
{error: true, message: "Invalid postcode!"}
valid_postcode('787898523')
result => {error: true, message: "Invalid postcode"}

Regular expression for address field validation

I am trying to write a regular expression that facilitates an address, example 21-big walk way or 21 St.Elizabeth's drive I came up with the following regular expression but I am not too keen to how to incorporate all the characters (alphanumeric, space dash, full stop, apostrophe)
"regexp=^[A-Za-z-0-99999999'
See the answer to this question on address validating with regex:
regex street address match
The problem is, street addresses vary so much in formatting that it's hard to code against them. If you are trying to validate addresses, finding if one isn't valid based on its format is mighty hard to do.
This would return the following address (253 N. Cherry St. ), anything with its same format:
\d{1,5}\s\w.\s(\b\w*\b\s){1,2}\w*\.
This allows 1-5 digits for the house number, a space, a character followed by a period (for N. or S.), 1-2 words for the street name, finished with an abbreviation (like st. or rd.).
Because regex is used to see if things meet a standard or protocol (which you define), you probably wouldn't want to allow for the addresses provided above, especially the first one with the dash, since they aren't very standard. you can modify my above code to allow for them if you wish--you could add
(-?)
to allow for a dash but not require one.
In addition, http://rubular.com/ is a quick and interactive way to learn regex. Try it out with the addresses above.
In case if you don't have a fixed format for the address as mentioned above, I would use regex expression just to eliminate the symbols which are not used in the address (like specialized sybmols - &(%#$^). Result would be:
[A-Za-z0-9'\.\-\s\,]
Just to add to Serzas' answer(since don't have enough reps. to comment).
alphabets and numbers can effectively be replaced by \w for words.
Additionally apostrophe,comma,period and hyphen doesn't necessarily need a backslash.
My requirement also involved front and back slashes so \/ and finally whitespaces with \s. The working regex for me ,as such was :
pattern: "[\w',-\\/.\s]"
Regular expression for simple address validation
^[#.0-9a-zA-Z\s,-]+$
E.g. for Address match case
#1, North Street, Chennai - 11
E.g. for Address not match case
$1, North Street, Chennai # 11
I have succesfully used ;
Dim regexString = New stringbuilder
With regexString
.Append("(?<h>^[\d]+[ ])(?<s>.+$)|") 'find the 2013 1st ambonstreet
.Append("(?<s>^.*?)(?<h>[ ][\d]+[ ])(?<e>[\D]+$)|") 'find the 1-7-4 Dual Ampstreet 130 A
.Append("(?<s>^[\D]+[ ])(?<h>[\d]+)(?<e>.*?$)|") 'find the Terheydenlaan 320 B3
.Append("(?<s>^.*?)(?<h>\d*?$)") 'find the 245e oosterkade 9
End With
Dim Address As Match = Regex.Match(DataRow("customerAddressLine1"), regexString.ToString(), RegexOptions.Multiline)
If Not String.IsNullOrEmpty(Address.Groups("s").Value) Then StreetName = Address.Groups("s").Value
If Not String.IsNullOrEmpty(Address.Groups("h").Value) Then HouseNumber = Address.Groups("h").Value
If Not String.IsNullOrEmpty(Address.Groups("e").Value) Then Extension = Address.Groups("e").Value
The regex will attempt to find a result, if there is none, it move to the next alternative. If no result is found, none of the 4 formats where present.
This one worked for me:
\d+[ ](?:[A-Za-z0-9.-]+[ ]?)+(?:Avenue|Lane|Road|Boulevard|Drive|Street|Ave|Dr|Rd|Blvd|Ln|St)\.?
The source: https://www.codeproject.com/Tips/989012/Validate-and-Find-Addresses-with-RegEx
Regex is a very bad choice for this kind of task. Try to find a web service or an address database or a product which can clean address data instead.
Related:
Address validation using Google Maps API
As a simple one line expression recommend this,
^([a-zA-z0-9/\\''(),-\s]{2,255})$
I needed
STREET # | STREET | CITY | STATE | ZIP
So I wrote the following regex
[0-9]{1,5}( [a-zA-Z.]*){1,4},?( [a-zA-Z]*){1,3},? [a-zA-Z]{2},? [0-9]{5}
This allows
1-5 Street #s
1-4 Street description words
1-3 City words
2 Char State
5 Char Zip code
I also added option , for separating street, city, state, zip
Here is the approach I have taken to finding addresses using regular expressions:
A set of patterns is useful to find many forms that we might expect from an address starting with simply a number followed by set of strings (ex. 1 Basic Road) and then getting more specific such as looking for "P.O. Box", "c/o", "attn:", etc.
Below is a simple test in python. The test will find all the addresses but not the last 4 items which are company names. This example is not comprehensive, but can be altered to suit your needs and catch examples you find in your data.
import re
strings = [
'701 FIFTH AVE',
'2157 Henderson Highway',
'Attn: Patent Docketing',
'HOLLYWOOD, FL 33022-2480',
'1940 DUKE STREET',
'111 MONUMENT CIRCLE, SUITE 3700',
'c/o Armstrong Teasdale LLP',
'1 Almaden Boulevard',
'999 Peachtree Street NE',
'P.O. BOX 2903',
'2040 MAIN STREET',
'300 North Meridian Street',
'465 Columbus Avenue',
'1441 SEAMIST DR.',
'2000 PENNSYLVANIA AVENUE, N.W.',
'465 Columbus Avenue',
'28 STATE STREET',
'P.O, Drawer 800889.',
'2200 CLARENDON BLVD.',
'840 NORTH PLANKINTON AVENUE',
'1025 Connecticut Avenue, NW',
'340 Commercial Street',
'799 Ninth Street, NW',
'11318 Lazarro Ln',
'P.O, Box 65745',
'c/o Ballard Spahr LLP',
'8210 SOUTHPARK TERRACE',
'1130 Connecticut Ave., NW, Suite 420',
'465 Columbus Avenue',
"BANNER & WITCOFF , LTD",
"CHIP LAW GROUP",
"HAMMER & ASSOCIATES, P.C.",
"MH2 TECHNOLOGY LAW GROUP, LLP",
]
patterns = [
"c\/o [\w ]{2,}",
"C\/O [\w ]{2,}",
"P.O\. [\w ]{2,}",
"P.O\, [\w ]{2,}",
"[\w\.]{2,5} BOX [\d]{2,8}",
"^[#\d]{1,7} [\w ]{2,}",
"[A-Z]{2,2} [\d]{5,5}",
"Attn: [\w]{2,}",
"ATTN: [\w]{2,}",
"Attention: [\w]{2,}",
"ATTENTION: [\w]{2,}"
]
contact_list = []
total_count = len(strings)
found_count = 0
for string in strings:
pat_no = 1
for pattern in patterns:
match = re.search(pattern, string.strip())
if match:
print("Item found: " + match.group(0) + " | Pattern no: " + str(pat_no))
found_count += 1
pat_no += 1
print("-- Total: " + str(total_count) + " Found: " + str(found_count))
UiPath Academy training video lists this RegEx for US addresses (and it works fine for me):
\b\d{1,8}(-)?[a-z]?\W[a-z|\W|\.]{1,}\W(road|drive|avenue|boulevard|circle|street|lane|waylrd\.|st\.|dr\.|ave\.|blvd\.|cir\.|In\.|rd|dr|ave|blvd|cir|ln)
I had a different use case - find any addresses in logs and scold application developers (favourite part of a devops job). I had the advantage of having the word "address" in the pattern but should work without that if you have specific field to scan
\baddress.[0-9\\\/# ,a-zA-Z]+[ ,]+[0-9\\\/#, a-zA-Z]{1,}
Look for the word "address" - skip this if not applicable
Look for first part numbers, letters, #, space - Unit Number / street number/suite number/door number
Separated by a space or comma
Look for one or more of rest of address numbers, letters, #, space
Tested against :
1 Sleepy Boulevard PO, Box 65745
Suite #100 /98,North St,Snoozepura
Ave., New Jersey,
Suite 420 1130 Connect Ave., NW,
Suite 420 19 / 21 Old Avenue,
Suite 12, Springfield, VIC 3001
Suite#100/98 North St Snoozepura
This worked for me when there were street addresses with unit/suite numbers, zip codes, only street. It also didn't match IP addresses or mac addresses. Worked with extra spaces.
This assumes users are normal people separate elements of a street address with a comma, hash sign, or space and not psychopaths who use characters like "|" or ":"!
For French address and some international address too, I use it.
[\\D+ || \\d]+\\d+[ ||,||[A-Za-z0-9.-]]+(?:[Rue|Avenue|Lane|... etcd|Ln|St]+[ ]?)+(?:[A-Za-z0-9.-](.*)]?)
I was inspired from the responses given here and came with those 2 solutions
support optional uppercase
support french also
regex structure
numbers (required)
letters, chars and spaces
at least one common address keyword (required)
as many chars you want before the line break
definitions:
accuracy
capacity of detecting addresses and not something that looks like an address which is not.
range
capacity to detect uncommon addresses.
Regex 1:
high accuracy
low range
/[0-9]+[ |[a-zà-ú.,-]* ((highway)|(autoroute)|(north)|(nord)|(south)|(sud)|(east)|(est)|(west)|(ouest)|(avenue)|(lane)|(voie)|(ruelle)|(road)|(rue)|(route)|(drive)|(boulevard)|(circle)|(cercle)|(street)|(cer\.)|(cir\.)|(blvd\.)|(hway\.)|(st\.)|(aut\.)|(ave\.)|(ln\.)|(rd\.)|(hw\.)|(dr\.)|(a\.))([ .,-]*[a-zà-ú0-9]*)*/i
regex 2:
low accuracy
high range
/[0-9]*[ |[a-zà-ú.,-]* ((highway)|(autoroute)|(north)|(nord)|(south)|(sud)|(east)|(est)|(west)|(ouest)|(avenue)|(lane)|(voie)|(ruelle)|(road)|(rue)|(route)|(drive)|(boulevard)|(circle)|(cercle)|(street)|(cer\.?)|(cir\.?)|(blvd\.?)|(hway\.?)|(st\.?)|(aut\.?)|(ave\.?)|(ln\.?)|(rd\.?)|(hw\.?)|(dr\.?)|(a\.))([ .,-]*[a-zà-ú0-9]*)*/i
This one works well for me
^(\d+) ?([A-Za-z](?= ))? (.*?) ([^ ]+?) ?((?<= )APT)? ?((?<= )\d*)?$
Source : https://community.alteryx.com/t5/Alteryx-Designer-Discussions/RegEx-Addresses-different-formats-and-headaches/td-p/360147
Here is my RegEx for address, city & postal validation rules
validation rules:
address -
1 - 40 characters length.
Letters, numbers, space and . , : ' #
city -
1 - 19 characters length
Only Alpha characters are allowed
Spaces are allowed
postalCode -
The USA zip must meet the following criteria and is required:
Minimum of 5 digits (9 digits if zip + 4 is provided)
Numeric only
A Canadian postal code is a six-character string.
in the format A1A 1A1, where A is a letter and 1 is a digit.
a space separates the third and fourth characters.
do not include the letters D, F, I, O, Q or U.
the first position does not make use of the letters W or Z.
address: ^[a-zA-Z0-9 .,#;:'-]{1,40}$
city: ^[a-zA-Z ]{1,19}$
usaPostal: ^([0-9]{5})(?:[-]?([0-9]{4}))?$
canadaPostal : ^(?!.*[DFIOQU])[A-VXY][0-9][A-Z] ?[0-9][A-Z][0-9]$
\b(\d{1,8}[a-z]?[0-9\/#- ,a-zA-Z]+[ ,]+[.0-9\/#, a-zA-Z]{1,})\n
A more dynamic approach to #micah would be the following:
(?'Address'(?'Street'[0-9][a-zA-Z\s]),?\s*(?'City'[A-Za-z\s]),?\s(?'Country'[A-Za-z])\s(?'Zipcode'[0-9]-?[0-9]))
It won't care about individual lengths of segments of code.
https://regex101.com/r/nuy7hB/1

RegEx for matching UK Postcodes

I'm after a regex that will validate a full complex UK postcode only within an input string. All of the uncommon postcode forms must be covered as well as the usual. For instance:
Matches
CW3 9SS
SE5 0EG
SE50EG
se5 0eg
WC2H 7LT
No Match
aWC2H 7LT
WC2H 7LTa
WC2H
How do I solve this problem?
I'd recommend taking a look at the UK Government Data Standard for postcodes [link now dead; archive of XML, see Wikipedia for discussion]. There is a brief description about the data and the attached xml schema provides a regular expression. It may not be exactly what you want but would be a good starting point. The RegEx differs from the XML slightly, as a P character in third position in format A9A 9AA is allowed by the definition given.
The RegEx supplied by the UK Government was:
([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\s?[0-9][A-Za-z]{2})
As pointed out on the Wikipedia discussion, this will allow some non-real postcodes (e.g. those starting AA, ZY) and they do provide a more rigorous test that you could try.
I recently posted an answer to this question on UK postcodes for the R language. I discovered that the UK Government's regex pattern is incorrect and fails to properly validate some postcodes. Unfortunately, many of the answers here are based on this incorrect pattern.
I'll outline some of these issues below and provide a revised regular expression that actually works.
Note
My answer (and regular expressions in general):
Only validates postcode formats.
Does not ensure that a postcode legitimately exists.
For this, use an appropriate API! See Ben's answer for more info.
If you don't care about the bad regex and just want to skip to the answer, scroll down to the Answer section.
The Bad Regex
The regular expressions in this section should not be used.
This is the failing regex that the UK government has provided developers (not sure how long this link will be up, but you can see it in their Bulk Data Transfer documentation):
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))))[0-9][A-Za-z]{2})$
Problems
Problem 1 - Copy/Paste
See regex in use here.
As many developers likely do, they copy/paste code (especially regular expressions) and paste them expecting them to work. While this is great in theory, it fails in this particular case because copy/pasting from this document actually changes one of the characters (a space) into a newline character as shown below:
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))))
[0-9][A-Za-z]{2})$
The first thing most developers will do is just erase the newline without thinking twice. Now the regex won't match postcodes with spaces in them (other than the GIR 0AA postcode).
To fix this issue, the newline character should be replaced with the space character:
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
^
Problem 2 - Boundaries
See regex in use here.
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
^^ ^ ^ ^^
The postcode regex improperly anchors the regex. Anyone using this regex to validate postcodes might be surprised if a value like fooA11 1AA gets through. That's because they've anchored the start of the first option and the end of the second option (independently of one another), as pointed out in the regex above.
What this means is that ^ (asserts position at start of the line) only works on the first option ([Gg][Ii][Rr] 0[Aa]{2}), so the second option will validate any strings that end in a postcode (regardless of what comes before).
Similarly, the first option isn't anchored to the end of the line $, so GIR 0AAfoo is also accepted.
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))))[0-9][A-Za-z]{2})$
To fix this issue, both options should be wrapped in another group (or non-capturing group) and the anchors placed around that:
^(([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2}))$
^^ ^^
Problem 3 - Improper Character Set
See regex in use here.
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
^^
The regex is missing a - here to indicate a range of characters. As it stands, if a postcode is in the format ANA NAA (where A represents a letter and N represents a number), and it begins with anything other than A or Z, it will fail.
That means it will match A1A 1AA and Z1A 1AA, but not B1A 1AA.
To fix this issue, the character - should be placed between the A and Z in the respective character set:
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
^
Problem 4 - Wrong Optional Character Set
See regex in use here.
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
^
I swear they didn't even test this thing before publicizing it on the web. They made the wrong character set optional. They made [0-9] option in the fourth sub-option of option 2 (group 9). This allows the regex to match incorrectly formatted postcodes like AAA 1AA.
To fix this issue, make the next character class optional instead (and subsequently make the set [0-9] match exactly once):
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?)))) [0-9][A-Za-z]{2})$
^
Problem 5 - Performance
Performance on this regex is extremely poor. First off, they placed the least likely pattern option to match GIR 0AA at the beginning. How many users will likely have this postcode versus any other postcode; probably never? This means every time the regex is used, it must exhaust this option first before proceeding to the next option. To see how performance is impacted check the number of steps the original regex took (35) against the same regex after having flipped the options (22).
The second issue with performance is due to the way the entire regex is structured. There's no point backtracking over each option if one fails. The way the current regex is structured can greatly be simplified. I provide a fix for this in the Answer section.
Problem 6 - Spaces
See regex in use here
This may not be considered a problem, per se, but it does raise concern for most developers. The spaces in the regex are not optional, which means the users inputting their postcodes must place a space in the postcode. This is an easy fix by simply adding ? after the spaces to render them optional. See the Answer section for a fix.
Answer
1. Fixing the UK Government's Regex
Fixing all the issues outlined in the Problems section and simplifying the pattern yields the following, shorter, more concise pattern. We can also remove most of the groups since we're validating the postcode as a whole (not individual parts):
See regex in use here
^([A-Za-z][A-Ha-hJ-Yj-y]?[0-9][A-Za-z0-9]? ?[0-9][A-Za-z]{2}|[Gg][Ii][Rr] ?0[Aa]{2})$
This can further be shortened by removing all of the ranges from one of the cases (upper or lower case) and using a case-insensitive flag. Note: Some languages don't have one, so use the longer one above. Each language implements the case-insensitivity flag differently.
See regex in use here.
^([A-Z][A-HJ-Y]?[0-9][A-Z0-9]? ?[0-9][A-Z]{2}|GIR ?0A{2})$
Shorter again replacing [0-9] with \d (if your regex engine supports it):
See regex in use here.
^([A-Z][A-HJ-Y]?\d[A-Z\d]? ?\d[A-Z]{2}|GIR ?0A{2})$
2. Simplified Patterns
Without ensuring specific alphabetic characters, the following can be used (keep in mind the simplifications from 1. Fixing the UK Government's Regex have also been applied here):
See regex in use here.
^([A-Z]{1,2}\d[A-Z\d]? ?\d[A-Z]{2}|GIR ?0A{2})$
And even further if you don't care about the special case GIR 0AA:
^[A-Z]{1,2}\d[A-Z\d]? ?\d[A-Z]{2}$
3. Complicated Patterns
I would not suggest over-verification of a postcode as new Areas, Districts and Sub-districts may appear at any point in time. What I will suggest potentially doing, is added support for edge-cases. Some special cases exist and are outlined in this Wikipedia article.
Here are complex regexes that include the subsections of 3. (3.1, 3.2, 3.3).
In relation to the patterns in 1. Fixing the UK Government's Regex:
See regex in use here
^(([A-Z][A-HJ-Y]?\d[A-Z\d]?|ASCN|STHL|TDCU|BBND|[BFS]IQQ|PCRN|TKCA) ?\d[A-Z]{2}|BFPO ?\d{1,4}|(KY\d|MSR|VG|AI)[ -]?\d{4}|[A-Z]{2} ?\d{2}|GE ?CX|GIR ?0A{2}|SAN ?TA1)$
And in relation to 2. Simplified Patterns:
See regex in use here
^(([A-Z]{1,2}\d[A-Z\d]?|ASCN|STHL|TDCU|BBND|[BFS]IQQ|PCRN|TKCA) ?\d[A-Z]{2}|BFPO ?\d{1,4}|(KY\d|MSR|VG|AI)[ -]?\d{4}|[A-Z]{2} ?\d{2}|GE ?CX|GIR ?0A{2}|SAN ?TA1)$
3.1 British Overseas Territories
The Wikipedia article currently states (some formats slightly simplified):
AI-1111: Anguila
ASCN 1ZZ: Ascension Island
STHL 1ZZ: Saint Helena
TDCU 1ZZ: Tristan da Cunha
BBND 1ZZ: British Indian Ocean Territory
BIQQ 1ZZ: British Antarctic Territory
FIQQ 1ZZ: Falkland Islands
GX11 1ZZ: Gibraltar
PCRN 1ZZ: Pitcairn Islands
SIQQ 1ZZ: South Georgia and the South Sandwich Islands
TKCA 1ZZ: Turks and Caicos Islands
BFPO 11: Akrotiri and Dhekelia
ZZ 11 & GE CX: Bermuda (according to this document)
KY1-1111: Cayman Islands (according to this document)
VG1111: British Virgin Islands (according to this document)
MSR 1111: Montserrat (according to this document)
An all-encompassing regex to match only the British Overseas Territories might look like this:
See regex in use here.
^((ASCN|STHL|TDCU|BBND|[BFS]IQQ|GX\d{2}|PCRN|TKCA) ?\d[A-Z]{2}|(KY\d|MSR|VG|AI)[ -]?\d{4}|(BFPO|[A-Z]{2}) ?\d{2}|GE ?CX)$
3.2 British Forces Post Office
Although they've been recently changed it to better align with the British postcode system to BF# (where # represents a number), they're considered optional alternative postcodes. These postcodes follow(ed) the format of BFPO, followed by 1-4 digits:
See regex in use here
^BFPO ?\d{1,4}$
3.3 Santa?
There's another special case with Santa (as mentioned in other answers): SAN TA1 is a valid postcode. A regex for this is very simply:
^SAN ?TA1$
It looks like we're going to be using ^(GIR ?0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?[0-9][ABD-HJLNP-UW-Z]{2})$, which is a slightly modified version of that sugested by Minglis above.
However, we're going to have to investigate exactly what the rules are, as the various solutions listed above appear to apply different rules as to which letters are allowed.
After some research, we've found some more information. Apparently a page on 'govtalk.gov.uk' points you to a postcode specification govtalk-postcodes. This points to an XML schema at XML Schema which provides a 'pseudo regex' statement of the postcode rules.
We've taken that and worked on it a little to give us the following expression:
^((GIR &0AA)|((([A-PR-UWYZ][A-HK-Y]?[0-9][0-9]?)|(([A-PR-UWYZ][0-9][A-HJKSTUW])|([A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRV-Y]))) &[0-9][ABD-HJLNP-UW-Z]{2}))$
This makes spaces optional, but does limit you to one space (replace the '&' with '{0,} for unlimited spaces). It assumes all text must be upper-case.
If you want to allow lower case, with any number of spaces, use:
^(([gG][iI][rR] {0,}0[aA]{2})|((([a-pr-uwyzA-PR-UWYZ][a-hk-yA-HK-Y]?[0-9][0-9]?)|(([a-pr-uwyzA-PR-UWYZ][0-9][a-hjkstuwA-HJKSTUW])|([a-pr-uwyzA-PR-UWYZ][a-hk-yA-HK-Y][0-9][abehmnprv-yABEHMNPRV-Y]))) {0,}[0-9][abd-hjlnp-uw-zABD-HJLNP-UW-Z]{2}))$
This doesn't cover overseas territories and only enforces the format, NOT the existence of different areas. It is based on the following rules:
Can accept the following formats:
“GIR 0AA”
A9 9ZZ
A99 9ZZ
AB9 9ZZ
AB99 9ZZ
A9C 9ZZ
AD9E 9ZZ
Where:
9 can be any single digit number.
A can be any letter except for Q, V or X.
B can be any letter except for I, J or Z.
C can be any letter except for I, L, M, N, O, P, Q, R, V, X, Y or Z.
D can be any letter except for I, J or Z.
E can be any of A, B, E, H, M, N, P, R, V, W, X or Y.
Z can be any letter except for C, I, K, M, O or V.
Best wishes
Colin
There is no such thing as a comprehensive UK postcode regular expression that is capable of validating a postcode. You can check that a postcode is in the correct format using a regular expression; not that it actually exists.
Postcodes are arbitrarily complex and constantly changing. For instance, the outcode W1 does not, and may never, have every number between 1 and 99, for every postcode area.
You can't expect what is there currently to be true forever. As an example, in 1990, the Post Office decided that Aberdeen was getting a bit crowded. They added a 0 to the end of AB1-5 making it AB10-50 and then created a number of postcodes in between these.
Whenever a new street is build a new postcode is created. It's part of the process for obtaining permission to build; local authorities are obliged to keep this updated with the Post Office (not that they all do).
Furthermore, as noted by a number of other users, there's the special postcodes such as Girobank, GIR 0AA, and the one for letters to Santa, SAN TA1 - you probably don't want to post anything there but it doesn't appear to be covered by any other answer.
Then, there's the BFPO postcodes, which are now changing to a more standard format. Both formats are going to be valid. Lastly, there's the overseas territories source Wikipedia.
+----------+----------------------------------------------+
| Postcode | Location |
+----------+----------------------------------------------+
| AI-2640 | Anguilla |
| ASCN 1ZZ | Ascension Island |
| STHL 1ZZ | Saint Helena |
| TDCU 1ZZ | Tristan da Cunha |
| BBND 1ZZ | British Indian Ocean Territory |
| BIQQ 1ZZ | British Antarctic Territory |
| FIQQ 1ZZ | Falkland Islands |
| GX11 1AA | Gibraltar |
| PCRN 1ZZ | Pitcairn Islands |
| SIQQ 1ZZ | South Georgia and the South Sandwich Islands |
| TKCA 1ZZ | Turks and Caicos Islands |
+----------+----------------------------------------------+
Next, you have to take into account that the UK "exported" its postcode system to many places in the world. Anything that validates a "UK" postcode will also validate the postcodes of a number of other countries.
If you want to validate a UK postcode the safest way to do it is to use a look-up of current postcodes. There are a number of options:
Ordnance Survey releases Code-Point Open under an open data licence. It'll be very slightly behind the times but it's free. This will (probably - I can't remember) not include Northern Irish data as the Ordnance Survey has no remit there. Mapping in Northern Ireland is conducted by the Ordnance Survey of Northern Ireland and they have their, separate, paid-for, Pointer product. You could use this and append the few that aren't covered fairly easily.
Royal Mail releases the Postcode Address File (PAF), this includes BFPO which I'm not sure Code-Point Open does. It's updated regularly but costs money (and they can be downright mean about it sometimes). PAF includes the full address rather than just postcodes and comes with its own Programmers Guide. The Open Data User Group (ODUG) is currently lobbying to have PAF released for free, here's a description of their position.
Lastly, there's AddressBase. This is a collaboration between Ordnance Survey, Local Authorities, Royal Mail and a matching company to create a definitive directory of all information about all UK addresses (they've been fairly successful as well). It's paid-for but if you're working with a Local Authority, government department, or government service it's free for them to use. There's a lot more information than just postcodes included.
^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$
Regular expression to match valid UK
postcodes. In the UK postal system not
all letters are used in all positions
(the same with vehicle registration
plates) and there are various rules to
govern this. This regex takes into
account those rules. Details of the
rules: First half of postcode Valid
formats [A-Z][A-Z][0-9][A-Z]
[A-Z][A-Z][0-9][0-9] [A-Z][0-9][0-9]
[A-Z][A-Z][0-9] [A-Z][A-Z][A-Z]
[A-Z][0-9][A-Z] [A-Z][0-9] Exceptions
Position - First. Contraint - QVX not
used Position - Second. Contraint -
IJZ not used except in GIR 0AA
Position - Third. Constraint -
AEHMNPRTVXY only used Position -
Forth. Contraint - ABEHMNPRVWXY Second
half of postcode Valid formats
[0-9][A-Z][A-Z] Exceptions Position -
Second and Third. Contraint - CIKMOV
not used
http://regexlib.com/REDetails.aspx?regexp_id=260
I had a look into some of the answers above and I'd recommend against using the pattern from #Dan's answer (c. Dec 15 '10), since it incorrectly flags almost 0.4% of valid postcodes as invalid, while the others do not.
Ordnance Survey provide service called Code Point Open which:
contains a list of all the current postcode units in Great Britain
I ran each of the regexs above against the full list of postcodes (Jul 6 '13) from this data using grep:
cat CSV/*.csv |
# Strip leading quotes
sed -e 's/^"//g' |
# Strip trailing quote and everything after it
sed -e 's/".*//g' |
# Strip any spaces
sed -E -e 's/ +//g' |
# Find any lines that do not match the expression
grep --invert-match --perl-regexp "$pattern"
There are 1,686,202 postcodes total.
The following are the numbers of valid postcodes that do not match each $pattern:
'^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]?[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$'
# => 6016 (0.36%)
'^(GIR ?0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?[0-9][ABD-HJLNP-UW-Z]{2})$'
# => 0
'^GIR[ ]?0AA|((AB|AL|B|BA|BB|BD|BH|BL|BN|BR|BS|BT|BX|CA|CB|CF|CH|CM|CO|CR|CT|CV|CW|DA|DD|DE|DG|DH|DL|DN|DT|DY|E|EC|EH|EN|EX|FK|FY|G|GL|GY|GU|HA|HD|HG|HP|HR|HS|HU|HX|IG|IM|IP|IV|JE|KA|KT|KW|KY|L|LA|LD|LE|LL|LN|LS|LU|M|ME|MK|ML|N|NE|NG|NN|NP|NR|NW|OL|OX|PA|PE|PH|PL|PO|PR|RG|RH|RM|S|SA|SE|SG|SK|SL|SM|SN|SO|SP|SR|SS|ST|SW|SY|TA|TD|TF|TN|TQ|TR|TS|TW|UB|W|WA|WC|WD|WF|WN|WR|WS|WV|YO|ZE)(\d[\dA-Z]?[ ]?\d[ABD-HJLN-UW-Z]{2}))|BFPO[ ]?\d{1,4}$'
# => 0
Of course, these results only deal with valid postcodes that are incorrectly flagged as invalid. So:
'^.*$'
# => 0
I'm saying nothing about which pattern is the best regarding filtering out invalid postcodes.
According to this Wikipedia table
This pattern cover all the cases
(?:[A-Za-z]\d ?\d[A-Za-z]{2})|(?:[A-Za-z][A-Za-z\d]\d ?\d[A-Za-z]{2})|(?:[A-Za-z]{2}\d{2} ?\d[A-Za-z]{2})|(?:[A-Za-z]\d[A-Za-z] ?\d[A-Za-z]{2})|(?:[A-Za-z]{2}\d[A-Za-z] ?\d[A-Za-z]{2})
When using it on Android\Java use \\d
Most of the answers here didn't work for all the postcodes I have in my database. I finally found one that validates with all, using the new regex provided by the government:
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/413338/Bulk_Data_Transfer_-_additional_validation_valid_from_March_2015.pdf
It isn't in any of the previous answers so I post it here in case they take the link down:
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
UPDATE: Updated regex as pointed by Jamie Bull. Not sure if it was my error copying or it was an error in the government's regex, the link is down now...
UPDATE: As ctwheels found, this regex works with the javascript regex flavor. See his comment for one that works with the pcre (php) flavor.
An old post but still pretty high in google results so thought I'd update. This Oct 14 doc defines the UK postcode regular expression as:
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([**AZ**a-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
from:
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/359448/4__Bulk_Data_Transfer_-_additional_validation_valid.pdf
The document also explains the logic behind it. However, it has an error (bolded) and also allows lower case, which although legal is not usual, so amended version:
^(GIR 0AA)|((([A-Z][0-9]{1,2})|(([A-Z][A-HJ-Y][0-9]{1,2})|(([A-Z][0-9][A-Z])|([A-Z][A-HJ-Y][0-9]?[A-Z])))) [0-9][A-Z]{2})$
This works with new London postcodes (e.g. W1D 5LH) that previous versions did not.
This is the regex Google serves on their i18napis.appspot.com domain:
GIR[ ]?0AA|((AB|AL|B|BA|BB|BD|BH|BL|BN|BR|BS|BT|BX|CA|CB|CF|CH|CM|CO|CR|CT|CV|CW|DA|DD|DE|DG|DH|DL|DN|DT|DY|E|EC|EH|EN|EX|FK|FY|G|GL|GY|GU|HA|HD|HG|HP|HR|HS|HU|HX|IG|IM|IP|IV|JE|KA|KT|KW|KY|L|LA|LD|LE|LL|LN|LS|LU|M|ME|MK|ML|N|NE|NG|NN|NP|NR|NW|OL|OX|PA|PE|PH|PL|PO|PR|RG|RH|RM|S|SA|SE|SG|SK|SL|SM|SN|SO|SP|SR|SS|ST|SW|SY|TA|TD|TF|TN|TQ|TR|TS|TW|UB|W|WA|WC|WD|WF|WN|WR|WS|WV|YO|ZE)(\d[\dA-Z]?[ ]?\d[ABD-HJLN-UW-Z]{2}))|BFPO[ ]?\d{1,4}
Postcodes are subject to change, and the only true way of validating a postcode is to have the complete list of postcodes and see if it's there.
But regular expressions are useful because they:
are easy to use and implement
are short
are quick to run
are quite easy to maintain (compared to a full list of postcodes)
still catch most input errors
But regular expressions tend to be difficult to maintain, especially for someone who didn't come up with it in the first place. So it must be:
as easy to understand as possible
relatively future proof
That means that most of the regular expressions in this answer aren't good enough. E.g. I can see that [A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRV-Y] is going to match a postcode area of the form AA1A — but it's going to be a pain in the neck if and when a new postcode area gets added, because it's difficult to understand which postcode areas it matches.
I also want my regular expression to match the first and second half of the postcode as parenthesised matches.
So I've come up with this:
(GIR(?=\s*0AA)|(?:[BEGLMNSW]|[A-Z]{2})[0-9](?:[0-9]|(?<=N1|E1|SE1|SW1|W1|NW1|EC[0-9]|WC[0-9])[A-HJ-NP-Z])?)\s*([0-9][ABD-HJLNP-UW-Z]{2})
In PCRE format it can be written as follows:
/^
( GIR(?=\s*0AA) # Match the special postcode "GIR 0AA"
|
(?:
[BEGLMNSW] | # There are 8 single-letter postcode areas
[A-Z]{2} # All other postcode areas have two letters
)
[0-9] # There is always at least one number after the postcode area
(?:
[0-9] # And an optional extra number
|
# Only certain postcode areas can have an extra letter after the number
(?<=N1|E1|SE1|SW1|W1|NW1|EC[0-9]|WC[0-9])
[A-HJ-NP-Z] # Possible letters here may change, but [IO] will never be used
)?
)
\s*
([0-9][ABD-HJLNP-UW-Z]{2}) # The last two letters cannot be [CIKMOV]
$/x
For me this is the right balance between validating as much as possible, while at the same time future-proofing and allowing for easy maintenance.
I've been looking for a UK postcode regex for the last day or so and stumbled on this thread. I worked my way through most of the suggestions above and none of them worked for me so I came up with my own regex which, as far as I know, captures all valid UK postcodes as of Jan '13 (according to the latest literature from the Royal Mail).
The regex and some simple postcode checking PHP code is posted below. NOTE:- It allows for lower or uppercase postcodes and the GIR 0AA anomaly but to deal with the, more than likely, presence of a space in the middle of an entered postcode it also makes use of a simple str_replace to remove the space before testing against the regex. Any discrepancies beyond that and the Royal Mail themselves don't even mention them in their literature (see http://www.royalmail.com/sites/default/files/docs/pdf/programmers_guide_edition_7_v5.pdf and start reading from page 17)!
Note: In the Royal Mail's own literature (link above) there is a slight ambiguity surrounding the 3rd and 4th positions and the exceptions in place if these characters are letters. I contacted Royal Mail directly to clear it up and in their own words "A letter in the 4th position of the Outward Code with the format AANA NAA has no exceptions and the 3rd position exceptions apply only to the last letter of the Outward Code with the format ANA NAA." Straight from the horse's mouth!
<?php
$postcoderegex = '/^([g][i][r][0][a][a])$|^((([a-pr-uwyz]{1}([0]|[1-9]\d?))|([a-pr-uwyz]{1}[a-hk-y]{1}([0]|[1-9]\d?))|([a-pr-uwyz]{1}[1-9][a-hjkps-uw]{1})|([a-pr-uwyz]{1}[a-hk-y]{1}[1-9][a-z]{1}))(\d[abd-hjlnp-uw-z]{2})?)$/i';
$postcode2check = str_replace(' ','',$postcode2check);
if (preg_match($postcoderegex, $postcode2check)) {
echo "$postcode2check is a valid postcode<br>";
} else {
echo "$postcode2check is not a valid postcode<br>";
}
?>
I hope it helps anyone else who comes across this thread looking for a solution.
Here's a regex based on the format specified in the documents which are linked to marcj's answer:
/^[A-Z]{1,2}[0-9][0-9A-Z]? ?[0-9][A-Z]{2}$/
The only difference between that and the specs is that the last 2 characters cannot be in [CIKMOV] according to the specs.
Edit:
Here's another version which does test for the trailing character limitations.
/^[A-Z]{1,2}[0-9][0-9A-Z]? ?[0-9][A-BD-HJLNP-UW-Z]{2}$/
Some of the regexs above are a little restrictive. Note the genuine postcode: "W1K 7AA" would fail given the rule "Position 3 - AEHMNPRTVXY only used" above as "K" would be disallowed.
the regex:
^(GIR 0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKPS-UW])[0-9][ABD-HJLNP-UW-Z]{2})$
Seems a little more accurate, see the Wikipedia article entitled 'Postcodes in the United Kingdom'.
Note that this regex requires uppercase only characters.
The bigger question is whether you are restricting user input to allow only postcodes that actually exist or whether you are simply trying to stop users entering complete rubbish into the form fields. Correctly matching every possible postcode, and future proofing it, is a harder puzzle, and probably not worth it unless you are HMRC.
I wanted a simple regex, where it's fine to allow too much, but not to deny a valid postcode. I went with this (the input is a stripped/trimmed string):
/^([a-z0-9]\s*){5,8}$/i
This allows the shortest possible postcodes like "L1 8JQ" as well as the longest ones like "OL14 5ET".
Because it allows up to 8 characters, it will also allow incorrect 8 character postcodes if there is no space: "OL145ETX". But again, this is a simplistic regex, for when that's good enough.
Whilst there are many answers here, I'm not happy with either of them. Most of them are simply broken, are too complex or just broken.
I looked at #ctwheels answer and I found it very explanatory and correct; we must thank him for that. However once again too much "data" for me, for something so simple.
Fortunately, I managed to get a database with over 1 million active postcodes for England only and made a small PowerShell script to test and benchmark the results.
UK Postcode specifications: Valid Postcode Format.
This is "my" Regex:
^([a-zA-Z]{1,2}[a-zA-Z\d]{1,2})\s(\d[a-zA-Z]{2})$
Short, simple and sweet. Even the most unexperienced can understand what is going on.
Explanation:
^ asserts position at start of a line
1st Capturing Group ([a-zA-Z]{1,2}[a-zA-Z\d]{1,2})
Match a single character present in the list below [a-zA-Z]
{1,2} matches the previous token between 1 and 2 times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Match a single character present in the list below [a-zA-Z\d]
{1,2} matches the previous token between 1 and 2 times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
\d matches a digit (equivalent to [0-9])
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
2nd Capturing Group (\d[a-zA-Z]{2})
\d matches a digit (equivalent to [0-9])
Match a single character present in the list below [a-zA-Z]
{2} matches the previous token exactly 2 times
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
$ asserts position at the end of a line
Result (postcodes checked):
TOTAL OK: 1469193
TOTAL FAILED: 0
-------------------------------------------------------------------------
Days : 0
Hours : 0
Minutes : 5
Seconds : 22
Milliseconds : 718
Ticks : 3227185939
TotalDays : 0.00373516891087963
TotalHours : 0.0896440538611111
TotalMinutes : 5.37864323166667
TotalSeconds : 322.7185939
TotalMilliseconds : 322718.5939
here's how we have been dealing with the UK postcode issue:
^([A-Za-z]{1,2}[0-9]{1,2}[A-Za-z]?[ ]?)([0-9]{1}[A-Za-z]{2})$
Explanation:
expect 1 or 2 a-z chars, upper or lower fine
expect 1 or 2 numbers
expect 0 or 1 a-z char, upper or lower fine
optional space allowed
expect 1 number
expect 2 a-z, upper or lower fine
This gets most formats, we then use the db to validate whether the postcode is actually real, this data is driven by openpoint https://www.ordnancesurvey.co.uk/opendatadownload/products.html
hope this helps
Basic rules:
^[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][ABD-HJLNP-UW-Z]{2}$
Postal codes in the U.K. (or postcodes, as they’re called) are composed of five to seven alphanumeric characters separated by a space. The rules covering which characters can appear at particular positions are rather complicated and fraught with exceptions. The regular expression just shown therefore sticks to the basic rules.
Complete rules:
If you need a regex that ticks all the boxes for the postcode rules at the expense of readability, here you go:
^(?:(?:[A-PR-UWYZ][0-9]{1,2}|[A-PR-UWYZ][A-HK-Y][0-9]{1,2}|[A-PR-UWYZ][0-9][A-HJKSTUW]|[A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRV-Y]) [0-9][ABD-HJLNP-UW-Z]{2}|GIR 0AA)$
Source: https://www.safaribooksonline.com/library/view/regular-expressions-cookbook/9781449327453/ch04s16.html
Tested against our customers database and seems perfectly accurate.
I use the following regex that I have tested against all valid UK postcodes. It is based on the recommended rules, but condensed as much as reasonable and does not make use of any special language specific regex rules.
([A-PR-UWYZ]([A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y])?|[0-9]([0-9]|[A-HJKPSTUW])?) ?[0-9][ABD-HJLNP-UW-Z]{2})
It assumes that the postcode has been converted to uppercase and has not leading or trailing characters, but will accept an optional space between the outcode and incode.
The special "GIR0 0AA" postcode is excluded and will not validate as it's not in the official Post Office list of postcodes and as far as I'm aware will not be used as registered address. Adding it should be trivial as a special case if required.
First half of postcode Valid formats
[A-Z][A-Z][0-9][A-Z]
[A-Z][A-Z][0-9][0-9]
[A-Z][0-9][0-9]
[A-Z][A-Z][0-9]
[A-Z][A-Z][A-Z]
[A-Z][0-9][A-Z]
[A-Z][0-9]
Exceptions
Position 1 - QVX not used
Position 2 - IJZ not used except in GIR 0AA
Position 3 - AEHMNPRTVXY only used
Position 4 - ABEHMNPRVWXY
Second half of postcode
[0-9][A-Z][A-Z]
Exceptions
Position 2+3 - CIKMOV not used
Remember not all possible codes are used, so this list is a necessary but not sufficent condition for a valid code. It might be easier to just match against a list of all valid codes?
To check a postcode is in a valid format as per the Royal Mail's programmer's guide:
|----------------------------outward code------------------------------| |------inward code-----|
#special↓ α1 α2 AAN AANA AANN AN ANN ANA (α3) N AA
^(GIR 0AA|[A-PR-UWYZ]([A-HK-Y]([0-9][A-Z]?|[1-9][0-9])|[1-9]([0-9]|[A-HJKPSTUW])?) [0-9][ABD-HJLNP-UW-Z]{2})$
All postcodes on doogal.co.uk match, except for those no longer in use.
Adding a ? after the space and using case-insensitive match to answer this question:
'se50eg'.match(/^(GIR 0AA|[A-PR-UWYZ]([A-HK-Y]([0-9][A-Z]?|[1-9][0-9])|[1-9]([0-9]|[A-HJKPSTUW])?) ?[0-9][ABD-HJLNP-UW-Z]{2})$/ig);
Array [ "se50eg" ]
This one allows empty spaces and tabs from both sides in case you don't want to fail validation and then trim it sever side.
^\s*(([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) {0,1}[0-9][A-Za-z]{2})\s*$)
Through empirical testing and observation, as well as confirming with https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation, here is my version of a Python regex that correctly parses and validates a UK postcode:
UK_POSTCODE_REGEX = r'(?P<postcode_area>[A-Z]{1,2})(?P<district>(?:[0-9]{1,2})|(?:[0-9][A-Z]))(?P<sector>[0-9])(?P<postcode>[A-Z]{2})'
This regex is simple and has capture groups. It does not include all of the validations of legal UK postcodes, but only takes into account the letter vs number positions.
Here is how I would use it in code:
#dataclass
class UKPostcode:
postcode_area: str
district: str
sector: int
postcode: str
# https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation
# Original author of this regex: #jontsai
# NOTE TO FUTURE DEVELOPER:
# Verified through empirical testing and observation, as well as confirming with the Wiki article
# If this regex fails to capture all valid UK postcodes, then I apologize, for I am only human.
UK_POSTCODE_REGEX = r'(?P<postcode_area>[A-Z]{1,2})(?P<district>(?:[0-9]{1,2})|(?:[0-9][A-Z]))(?P<sector>[0-9])(?P<postcode>[A-Z]{2})'
#classmethod
def from_postcode(cls, postcode):
"""Parses a string into a UKPostcode
Returns a UKPostcode or None
"""
m = re.match(cls.UK_POSTCODE_REGEX, postcode.replace(' ', ''))
if m:
uk_postcode = UKPostcode(
postcode_area=m.group('postcode_area'),
district=m.group('district'),
sector=m.group('sector'),
postcode=m.group('postcode')
)
else:
uk_postcode = None
return uk_postcode
def parse_uk_postcode(postcode):
"""Wrapper for UKPostcode.from_postcode
"""
uk_postcode = UKPostcode.from_postcode(postcode)
return uk_postcode
Here are unit tests:
#pytest.mark.parametrize(
'postcode, expected', [
# https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation
(
'EC1A1BB',
UKPostcode(
postcode_area='EC',
district='1A',
sector='1',
postcode='BB'
),
),
(
'W1A0AX',
UKPostcode(
postcode_area='W',
district='1A',
sector='0',
postcode='AX'
),
),
(
'M11AE',
UKPostcode(
postcode_area='M',
district='1',
sector='1',
postcode='AE'
),
),
(
'B338TH',
UKPostcode(
postcode_area='B',
district='33',
sector='8',
postcode='TH'
)
),
(
'CR26XH',
UKPostcode(
postcode_area='CR',
district='2',
sector='6',
postcode='XH'
)
),
(
'DN551PT',
UKPostcode(
postcode_area='DN',
district='55',
sector='1',
postcode='PT'
)
)
]
)
def test_parse_uk_postcode(postcode, expected):
uk_postcode = parse_uk_postcode(postcode)
assert(uk_postcode == expected)
To add to this list a more practical regex that I use that allows the user to enter an empty string is:
^$|^(([gG][iI][rR] {0,}0[aA]{2})|((([a-pr-uwyzA-PR-UWYZ][a-hk-yA-HK-Y]?[0-9][0-9]?)|(([a-pr-uwyzA-PR-UWYZ][0-9][a-hjkstuwA-HJKSTUW])|([a-pr-uwyzA-PR-UWYZ][a-hk-yA-HK-Y][0-9][abehmnprv-yABEHMNPRV-Y]))) {0,1}[0-9][abd-hjlnp-uw-zABD-HJLNP-UW-Z]{2}))$
This regex allows capital and lower case letters with an optional space in between
From a software developers point of view this regex is useful for software where an address may be optional. For example if a user did not want to supply their address details
Have a look at the python code on this page:
http://www.brunningonline.net/simon/blog/archives/001292.html
I've got some postcode parsing to do. The requirement is pretty simple; I have to parse a postcode into an outcode and (optional) incode. The good new is that I don't have to perform any validation - I just have to chop up what I've been provided with in a vaguely intelligent manner. I can't assume much about my import in terms of formatting, i.e. case and embedded spaces. But this isn't the bad news; the bad news is that I have to do it all in RPG. :-(
Nevertheless, I threw a little Python function together to clarify my thinking.
I've used it to process postcodes for me.
I have the regex for UK Postcode validation.
This is working for all type of Postcode either inner or outer
^((([A-PR-UWYZ][0-9])|([A-PR-UWYZ][0-9][0-9])|([A-PR-UWYZ][A-HK-Y][0-9])|([A-PR-UWYZ][A-HK-Y][0-9][0-9])|([A-PR-UWYZ][0-9][A-HJKSTUW])|([A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRVWXY]))) || ^((GIR)[ ]?(0AA))$|^(([A-PR-UWYZ][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$|^(([A-PR-UWYZ][0-9][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$|^(([A-PR-UWYZ][A-HK-Y0-9][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$|^(([A-PR-UWYZ][A-HK-Y0-9][0-9][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$|^(([A-PR-UWYZ][0-9][A-HJKS-UW0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$|^(([A-PR-UWYZ][A-HK-Y0-9][0-9][ABEHMNPRVWXY0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$
This is working for all type of format.
Example:
AB10-------------------->ONLY OUTER POSTCODE
A1 1AA------------------>COMBINATION OF (OUTER AND INNER) POSTCODE
WC2A-------------------->OUTER
We were given a spec:
UK postcodes must be in one of the following forms (with one exception, see below):
§ A9 9AA
§ A99 9AA
§ AA9 9AA
§ AA99 9AA
§ A9A 9AA
§ AA9A 9AA
where A represents an alphabetic character and 9 represents a numeric character.
Additional rules apply to alphabetic characters, as follows:
§ The character in position 1 may not be Q, V or X
§ The character in position 2 may not be I, J or Z
§ The character in position 3 may not be I, L, M, N, O, P, Q, R, V, X, Y or Z
§ The character in position 4 may not be C, D, F, G, I, J, K, L, O, Q, S, T, U or Z
§ The characters in the rightmost two positions may not be C, I, K, M, O or V
The one exception that does not follow these general rules is the postcode "GIR 0AA", which is a special valid postcode.
We came up with this:
/^([A-PR-UWYZ][A-HK-Y0-9](?:[A-HJKS-UW0-9][ABEHMNPRV-Y0-9]?)?\s*[0-9][ABD-HJLNP-UW-Z]{2}|GIR\s*0AA)$/i
But note - this allows any number of spaces in between groups.
The accepted answer reflects the rules given by Royal Mail, although there is a typo in the regex. This typo seems to have been in there on the gov.uk site as well (as it is in the XML archive page).
In the format A9A 9AA the rules allow a P character in the third position, whilst the regex disallows this. The correct regex would be:
(GIR 0AA)|((([A-Z-[QVX]][0-9][0-9]?)|(([A-Z-[QVX]][A-Z-[IJZ]][0-9][0-9]?)|(([A-Z-[QVX]][0-9][A-HJKPSTUW])|([A-Z-[QVX]][A-Z-[IJZ]][0-9][ABEHMNPRVWXY])))) [0-9][A-Z-[CIKMOV]]{2})
Shortening this results in the following regex (which uses Perl/Ruby syntax):
(GIR 0AA)|([A-PR-UWYZ](([0-9]([0-9A-HJKPSTUW])?)|([A-HK-Y][0-9]([0-9ABEHMNPRVWXY])?))\s?[0-9][ABD-HJLNP-UW-Z]{2})
It also includes an optional space between the first and second block.
What i have found in nearly all the variations and the regex from the bulk transfer pdf and what is on wikipedia site is this, specifically for the wikipedia regex is, there needs to be a ^ after the first |(vertical bar). I figured this out by testing for AA9A 9AA, because otherwise the format check for A9A 9AA will validate it. For Example checking for EC1D 1BB which should be invalid comes back valid because C1D 1BB is a valid format.
Here is what I've come up with for a good regex:
^([G][I][R] 0[A]{2})|^((([A-Z-[QVX]][0-9]{1,2})|([A-Z-[QVX]][A-HK-Y][0-9]{1,2})|([A-Z-[QVX]][0-9][ABCDEFGHJKPSTUW])|([A-Z-[QVX]][A-HK-Y][0-9][ABEHMNPRVWXY])) [0-9][A-Z-[CIKMOV]]{2})$
Below method will check the post code and provide complete info
const isValidUKPostcode = postcode => {
try {
postcode = postcode.replace(/\s/g, "");
const fromat = postcode
.toUpperCase()
.match(/^([A-Z]{1,2}\d{1,2}[A-Z]?)\s*(\d[A-Z]{2})$/);
const finalValue = `${fromat[1]} ${fromat[2]}`;
const regex = /^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))))[0-9][A-Za-z]{2})$/i;
return {
isValid: regex.test(postcode),
formatedPostCode: finalValue,
error: false,
message: 'It is a valid postcode'
};
} catch (error) {
return { error: true , message: 'Invalid postcode'};
}
};
console.log(isValidUKPostcode('GU348RR'))
{isValid: true, formattedPostcode: "GU34 8RR", error: false, message: "It is a valid postcode"}
console.log(isValidUKPostcode('sdasd4746asd'))
{error: true, message: "Invalid postcode!"}
valid_postcode('787898523')
result => {error: true, message: "Invalid postcode"}