Regular Expressions for City name - regex

I need a regular Expression for Validating City textBox, the city textbox field accepts only Letters, spaces and dashes(-).

This answer assumes that the letters which #Manaysah refers to also encompasses the use of diacritical marks. I've added the single quote ' since many names in Canada and France have it. I've also added the period (dot) since it's required for contracted names.
Building upon #UIDs answer I came up with,
^([a-zA-Z\u0080-\u024F]+(?:. |-| |'))*[a-zA-Z\u0080-\u024F]*$
The list of cities it accepts:
Toronto
St. Catharines
San Fransisco
Val-d'Or
Presqu'ile
Niagara on the Lake
Niagara-on-the-Lake
München
toronto
toRonTo
villes du Québec
Provence-Alpes-Côte d'Azur
Île-de-France
Kópavogur
Garðabær
Sauðárkrókur
Þorlákshöfn
And what it rejects:
A----B
------
*******
&&
()
//
\\
I didn't add in the use of brackets and other marks since it didn't fall within the scope of this question.
I've stayed away from \s for whitespace. Tabs and line feeds aren't part of a city name and shouldn't be used in my opinion.

This can be arbitrarily complex, depending on how precise you need the match to be, and the variation you're willing to allow.
Something fairly simple like ^[a-zA-Z]+(?:[\s-][a-zA-Z]+)*$ should work.
warning: This does not match cities like München, etc, but here you basically need to work with the [a-zA-Z] part of the expression, and define what characters are allowed for your particular case.
Keep in mind that it also allows for something like San----Francisco, or having several spaces.
Translates to something like:
1 or more letters, followed by a block of: 0 or more spaces or dashes and more letters, this last block can occur 0 or more times.
Weird stuff in there: the ?: bit. If you're not familiarized with regexes, it might be confusing, but that simply states that the piece of regex between parenthesis, is not a capturing group (I don't want to capture the part it matches to reuse later), so the parenthesis are only used as to group the expression (and not to capture the match).
"New York" // passes
"San-Francisco" // passes
"San Fran Cisco" // passes (sorry, needed an example with three tokens)
"Chicago" // passes
" Chicago" // doesn't pass, starts with spaces
"San-" // doesn't pass, ends with a dash

Adding my answer if anybody needs its while searching for Regex for City Names, Like I did
Please use this :
^[a-zA-Z\u0080-\u024F\s\/\-\)\(\`\.\"\']+$
As many city names contains dashes, such as Soddy-Daisy, Tennessee, or special characters like, ñ in La Cañada Flintridge, California
Hope this helps!

Here is the one I've found works best
for PCRE flavours allowing \p{L} (.NET, php, Golang)
/^\p{L}+(?:([\ \-\']|(\.\ ))\p{L}+)*$/u
for regex that does not allow \p{L} replace it with [a-zA-Z\u0080-\u024F]
so for javascript, python regex use
/^[a-zA-Z\u0080-\u024F]+(?:([\ \-\']|(\.\ ))[a-zA-Z\u0080-\u024F]+)*$/
White listing a bunch of character is easy, but there are things to watch for in your regex
consecutive non-alphabetical characters should not be allowed. i.e. Los Angeles should fail because it has two spaces
periods should have a space after. i.e. St.Albert should fail because it's missing the space
names cannot start or end with non-alphabetical characters i.e. -Chicago- should fail
a whitespace character \s !== \, i.e. a tab and line feed character could pass, so space character should be defined instead
Note: When building regex rules, I find https://regex101.com/tests is very helpful, as you can easily create unit tests
js: https://regex101.com/r/cgJwc0/1/tests
php: https://regex101.com/r/Yo3GV2/1/tests

Here's one that will work with most cities, and has been tested:
^[a-zA-Z\u0080-\u024F]+(?:. |-| |')*([1-9a-zA-Z\u0080-\u024F]+(?:. |-| |'))*[a-zA-Z\u0080-\u024F]*$
Python code below, including its test.
import re
import pytest
CITY_RE = re.compile(
r"^[a-zA-Z\u0080-\u024F]+(?:. |-| |')*" # a word
r"([1-9a-zA-Z\u0080-\u024F]+(?:. |-| |'))*"
r"[a-zA-Z\u0080-\u024F]*$"
)
def is_city(value: str) -> bool:
valid = CITY_RE.match(value) is not None
return valid
# Tests
#pytest.mark.parametrize(
"value,expected",
(
("1", False),
("Toronto", True),
("Saint-Père-en-Retz", True),
("Saint Père en Retz", True),
("Saint-Père en Retz", True),
("Paris 13e Arrondissement", True),
("Paris 13e Arrondissement ", True),
("Bouc-Étourdi", True),
("Arnac-la-Poste", True),
("Bourré", True),
("Å", True),
("San Francisco", True),
),
)
def test_is_city(value, expected):
valid, msg = validate.is_city(value)
assert valid is expected

^[a-zA-Z\- ]+$
Also this might be useful http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

use this regex:
^[a-zA-Z-\s]+$

After many hours of looking for a city regex matcher I have built this and it meets my needs 100%
(?ix)^[A-Z.-]+(?:\s+[A-Z.-]+)*$
expression for testing city.
Matches
City
St. City
Some Silly-City
City St.
Too Many Words City
it seems that there are many flavors of regex and I built this for my Java needs and it works great

^[a-zA-Z.-]+(?:[\s-][\/a-zA-Z.]+)*$
This will help identify some city names like St. Johns, Baie-Sainte-Anne, Grand-Salut/Grand Falls

I like shepley's suggestion, but it has a couple flaws in it.
If you change shpeley's regex to this, it will not accept other special characters:
^([a-zA-Z\u0080-\u024F]{1}[a-zA-Z\u0080-\u024F\. |\-| |']*[a-zA-Z\u0080-\u024F\.']{1})$

I use that one:
^[a-zA-Z\\u0080-\\u024F.]+((?:[ -.|'])[a-zA-Z\\u0080-\\u024F]+)*$

You can try this:
^\p{L}+(?:[\s\-]\p{L}+)*
The above regex will:
Restrict leading and trailing spaces, hyphens
Match cities with names like Néewiller-près-lauterbourg

Here are some fun edge-cases:
's Graveland
's Gravendeel
's Gravenpolder
's Gravenzande
's Heer Arendskerke
's Heerenberg
's Heerenhoek
's Hertogenbosch
't Harde
't Veld
't Zand
100 Mile House
6 October City
So, don't forget to add ' and 0-9 as a possible first character of the city name.

Related

Regular expression for non-consecutive characters

I'm trying to create a regular expression that validates the following requirements:
Simultaneous use of Cyrillic and numbers is possible (without spaces and special characters)
Simultaneous use of Latin and numbers is possible (without spaces and special characters)
Simultaneous use of Cyrillic and Latin characters is not possible
The first letter must be capitalized, cannot be a number
Sequence length - from 2 to 16 digits inclusive
It is impossible to use 3 or more identical symbols in a row
I am using the following solution:
(?:([A-Z][A-Za-z0-9]{1,15}|[А-Я][А-ЯЁа-яё0-9]{1,15}))$
How do I change the regex to match the last requirement?
I use Google Sheets, in which it is impossible to use negative lookahead.
Sorry for my English.
I don't you can do this with a single regex without lookbehinds.
But there are workarounds for the "don't repeat same character 3 times" functionality.
The workarounds could be simpler if RE2 supported backreferences, but it does not. So the resulting rule will be longer.
You may define a column ValidNoThreeRepeats like this:
=
NOT(
OR(
AND(MID(A1;1 ;1)=MID(A1;2 ;1);MID(A1;2 ;1)=MID(A1;3 ;1));
AND(MID(A1;2 ;1)=MID(A1;3 ;1);MID(A1;3 ;1)=MID(A1;4 ;1));
AND(MID(A1;3 ;1)=MID(A1;4 ;1);MID(A1;4 ;1)=MID(A1;5 ;1));
AND(MID(A1;4 ;1)=MID(A1;5 ;1);MID(A1;5 ;1)=MID(A1;6 ;1));
AND(MID(A1;5 ;1)=MID(A1;6 ;1);MID(A1;6 ;1)=MID(A1;7 ;1));
AND(MID(A1;6 ;1)=MID(A1;7 ;1);MID(A1;7 ;1)=MID(A1;8 ;1));
AND(MID(A1;7 ;1)=MID(A1;8 ;1);MID(A1;8 ;1)=MID(A1;9 ;1));
AND(MID(A1;8 ;1)=MID(A1;9 ;1);MID(A1;9 ;1)=MID(A1;10;1));
AND(MID(A1;9 ;1)=MID(A1;10;1);MID(A1;10;1)=MID(A1;11;1));
AND(MID(A1;10;1)=MID(A1;11;1);MID(A1;11;1)=MID(A1;12;1));
AND(MID(A1;11;1)=MID(A1;12;1);MID(A1;12;1)=MID(A1;13;1));
AND(MID(A1;12;1)=MID(A1;13;1);MID(A1;13;1)=MID(A1;14;1));
AND(MID(A1;13;1)=MID(A1;14;1);MID(A1;14;1)=MID(A1;15;1))
)
)
Or in a compacted way like this:
=NOT(OR(AND(MID(A1;1 ;1)=MID(A1;2 ;1);MID(A1;2 ;1)=MID(A1;3 ;1));AND(MID(A1;2 ;1)=MID(A1;3 ;1);MID(A1;3 ;1)=MID(A1;4 ;1));AND(MID(A1;3 ;1)=MID(A1;4 ;1);MID(A1;4 ;1)=MID(A1;5 ;1));AND(MID(A1;4 ;1)=MID(A1;5 ;1);MID(A1;5 ;1)=MID(A1;6 ;1));AND(MID(A1;5 ;1)=MID(A1;6 ;1);MID(A1;6 ;1)=MID(A1;7 ;1));AND(MID(A1;6 ;1)=MID(A1;7 ;1);MID(A1;7 ;1)=MID(A1;8 ;1));AND(MID(A1;7 ;1)=MID(A1;8 ;1);MID(A1;8 ;1)=MID(A1;9 ;1));AND(MID(A1;8 ;1)=MID(A1;9 ;1);MID(A1;9 ;1)=MID(A1;10;1));AND(MID(A1;9 ;1)=MID(A1;10;1);MID(A1;10;1)=MID(A1;11;1));AND(MID(A1;10;1)=MID(A1;11;1);MID(A1;11;1)=MID(A1;12;1));AND(MID(A1;11;1)=MID(A1;12;1);MID(A1;12;1)=MID(A1;13;1));AND(MID(A1;12;1)=MID(A1;13;1);MID(A1;13;1)=MID(A1;14;1));AND(MID(A1;13;1)=MID(A1;14;1);MID(A1;14;1)=MID(A1;15;1))))
The idea is to have a rule that compares 1st, 2nd and 3rd character, then another rule that compares 2nd, 3rd, 4th, then another rule for 3rd, 4th, 5th, and so on and so forth. You join this rules with an OR, since if any of those match, it means that at some place some repetition exists. Finally, you negate the whole expresion with a NOT
Than you can check that both your regex and that column are valid.
Donno with which script language you're using
If's in PHP code form,I'd be using `Filter_var($param1, FILTER_VALIDATE..., FILTER_FLAG..)` if i were in your shoes .
It makes your way into both **validating** n **sanitizing** your snippet.
**PEACE**.

Regex to insert space with certain characters but avoid date and time

I made a regex which inserts a space where ever there is any of the characters
-:\*_/;, present for example JET*AIRWAYS\INDIA/858701/IDBI 05/05/05;05:05:05 a/c should beJET* AIRWAYS\ INDIA/ 858701/ IDBI 05/05/05; 05:05:05 a/c
The regex I used is (?!a\/c|w\/d|m\/s|s\/w|m\/o)(\D-|\D:|\D\*|\D_|\D\\|\D\/|\D\;)
I have added some words exceptions like a/c w/d etc. \D conditions given to avoid date/time values getting separated, but this created an issue, the numbers followed by the above mentioned characters never get split.
My requirement is
1. Insert a space after characters -:\*_/;,
2. but date and time should not get split which may have / :
3. need exception on words like a/c w/d
The following is the full code
Private Function formatColon(oldString As String) As String
Dim reg As New RegExp: reg.Global = True: reg.Pattern = "(?!a\/c|w\/d|m\/s|s\/w|m\/o)(\D-|\D:|\D\*|\D_|\D\\|\D\/|\D\;)" '"(\D:|\D/|\D-|^w/d)"
Dim newString As String: newString = reg.Replace(oldString, "$1 ")
formatColon = XtraspaceKill(newString)
End Function
I would use 3 replacements.
Replace all date and time special characters with a special macro that should never be found in your text, e.g. for 05/15/2018 4:06 PM, something based on your name:
05MANUMOHANSLASH15MANUMOHANSLASH2018 4MANUMOHANCOLON06 PM
You can encode exceptions too, like this:
aMANUMOHANSLASHc
Now run your original regex to replace all special characters.
Finally, unreplace the macros MANUMOHANSLASH and MANUMOHANCOLON.
Meanwhile, let me tell you why this is complicated in a single regex.
If trying to do this in a single regex, you have to ask, for each / or :, "Am I a part of a date or time?"
To answer that, you need to use lookahead and lookbehind assertions, the latter of which Microsoft has finally added support for.
But given a /, you don't know if you're between the first and second, or second and third parts of the date. Similar for time.
The number of cases you need to consider will render your regex unmaintainably complex.
So please just use a few separate replacements :-)

Regex for IBAN allowing for white spaces AND checking for exact length

I need to check an input field for a German IBAN. The user should be allowed to leave in white spaces and input should be validated to have a starting DE and then exact 20 characters numbers and letters.
Without the white space allowance, I tried
^[DE]{2}([0-9a-zA-Z]{20})$
but I cannot find where and how I can add "white spaces anywhere allowed.
This should be simple, but I simply cannot find a solution.
Thanks for help!
Because you should use the right tool for the right task: you should not rely on regexps to validate IBAN numbers, but instead use the IBAN checksum algorithm to check the whole code is actually correct, making any regexp superfluous and redundant. i.e.: remove all spaces, rearrange the code, convert to integers, and compute remainder, here it's best explained.
Though, there am I trying to answer your question, for the fun of it:
what about:
^DE([0-9a-zA-Z]\s?){20}$
which only difference is allowing a whitespace (or not) after each occurence of a alphanumeric character.
here is the visualization:
edit: for the OP's information, the only difference is that this regexp, from #ulugbex-umirov: (?:\s*[0-9a-zA-Z]\s*) does a lookahead check to see if there's a space between the iso country code and the checksum (which only made of numerical digits), which I do not support on purpose.
And actually to support a correct IBAN syntax, which is formed of groups of 4 characters, as the wikipedia page says:
^DE\d{2}\s?([0-9a-zA-Z]{4}\s?){4}[0-9a-zA-Z]{2}$
example
If your UI is in Javascript, you can use that library for doing IBAN validation:
<script src="iban.js"></script>
<script>
// the API is now accessible from the window.IBAN global object
IBAN.isValid('hello world'); // false
IBAN.isValid('BE68539007547034'); // true
</script>
so you know this is a valid IBAN, and can validate it before the data is ever even sent to the backend. Simpler, lighter and more elegant… Why do something else?
Here is a list of IBANs from 70 Countries. I generated it with a python script i wrote based on this https://en.wikipedia.org/wiki/International_Bank_Account_Number
AL[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){2}([a-zA-Z0-9]{4}\s?){4}\s?
AD[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){2}([a-zA-Z0-9]{4}\s?){3}\s?
AT[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}\s?
AZ[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{4}\s?){1}([0-9]{4}\s?){5}\s?
BH[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([a-zA-Z0-9]{4}\s?){3}([a-zA-Z0-9]{2})\s?
BY[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{4}\s?){1}([0-9]{4}\s?){5}\s?
BE[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){3}\s?
BA[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}\s?
BR[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}([0-9]{3})([a-zA-Z]{1}\s?)([a-zA-Z0-9]{1})\s?
BG[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([0-9]{4}\s?){1}([0-9]{2})([a-zA-Z0-9]{2}\s?)([a-zA-Z0-9]{4}\s?){1}([a-zA-Z0-9]{2})\s?
CR[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}([0-9]{2})\s?
HR[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}([0-9]{1})\s?
CY[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){2}([a-zA-Z0-9]{4}\s?){4}\s?
CZ[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}\s?
DK[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){3}([0-9]{2})\s?
DO[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([0-9]{4}\s?){5}\s?
TL[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}([0-9]{3})\s?
EE[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}\s?
FO[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){3}([0-9]{2})\s?
FI[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){3}([0-9]{2})\s?
FR[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){2}([0-9]{2})([a-zA-Z0-9]{2}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{1})([0-9]{2})\s?
GE[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{2})([0-9]{2}\s?)([0-9]{4}\s?){3}([0-9]{2})\s?
DE[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}([0-9]{2})\s?
GI[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([a-zA-Z0-9]{4}\s?){3}([a-zA-Z0-9]{3})\s?
GR[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){1}([0-9]{3})([a-zA-Z0-9]{1}\s?)([a-zA-Z0-9]{4}\s?){3}([a-zA-Z0-9]{3})\s?
GL[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){3}([0-9]{2})\s?
GT[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{4}\s?){1}([a-zA-Z0-9]{4}\s?){5}\s?
HU[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){6}\s?
IS[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}([0-9]{2})\s?
IE[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{4}\s?){1}([0-9]{4}\s?){3}([0-9]{2})\s?
IL[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}([0-9]{3})\s?
IT[a-zA-Z0-9]{2}\s?([a-zA-Z]{1})([0-9]{3}\s?)([0-9]{4}\s?){1}([0-9]{3})([a-zA-Z0-9]{1}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{3})\s?
JO[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([0-9]{4}\s?){5}([0-9]{2})\s?
KZ[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){3}([0-9]{1})([a-zA-Z0-9]{3}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{2})\s?
XK[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){1}([0-9]{4}\s?){2}([0-9]{2})([0-9]{2}\s?)\s?
KW[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([a-zA-Z0-9]{4}\s?){5}([a-zA-Z0-9]{2})\s?
LV[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([a-zA-Z0-9]{4}\s?){3}([a-zA-Z0-9]{1})\s?
LB[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){1}([a-zA-Z0-9]{4}\s?){5}\s?
LI[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){1}([0-9]{1})([a-zA-Z0-9]{3}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{1})\s?
LT[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}\s?
LU[a-zA-Z0-9]{2}\s?([0-9]{3})([a-zA-Z0-9]{1}\s?)([a-zA-Z0-9]{4}\s?){3}\s?
MK[a-zA-Z0-9]{2}\s?([0-9]{3})([a-zA-Z0-9]{1}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{1})([0-9]{2})\s?
MT[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([0-9]{4}\s?){1}([0-9]{1})([a-zA-Z0-9]{3}\s?)([a-zA-Z0-9]{4}\s?){3}([a-zA-Z0-9]{3})\s?
MR[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}([0-9]{3})\s?
MU[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([0-9]{4}\s?){4}([0-9]{3})([a-zA-Z]{1}\s?)([a-zA-Z]{2})\s?
MC[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){2}([0-9]{2})([a-zA-Z0-9]{2}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{1})([0-9]{2})\s?
MD[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{2})([a-zA-Z0-9]{2}\s?)([a-zA-Z0-9]{4}\s?){4}\s?
ME[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}([0-9]{2})\s?
NL[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([0-9]{4}\s?){2}([0-9]{2})\s?
NO[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){2}([0-9]{3})\s?
PK[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{4}\s?){1}([0-9]{4}\s?){4}\s?
PS[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{4}\s?){1}([0-9]{4}\s?){5}([0-9]{1})\s?
PL[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){6}\s?
PT[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}([0-9]{1})\s?
QA[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([a-zA-Z0-9]{4}\s?){5}([a-zA-Z0-9]{1})\s?
RO[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([a-zA-Z0-9]{4}\s?){4}\s?
SM[a-zA-Z0-9]{2}\s?([a-zA-Z]{1})([0-9]{3}\s?)([0-9]{4}\s?){1}([0-9]{3})([a-zA-Z0-9]{1}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{3})\s?
SA[a-zA-Z0-9]{2}\s?([0-9]{2})([a-zA-Z0-9]{2}\s?)([a-zA-Z0-9]{4}\s?){4}\s?
RS[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}([0-9]{2})\s?
SK[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}\s?
SI[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){3}([0-9]{3})\s?
ES[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}\s?
SE[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}\s?
CH[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){1}([0-9]{1})([a-zA-Z0-9]{3}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{1})\s?
TN[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}\s?
TR[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){1}([0-9]{1})([a-zA-Z0-9]{3}\s?)([a-zA-Z0-9]{4}\s?){3}([a-zA-Z0-9]{2})\s?
AE[a-zA-Z0-9]{2}\s?([0-9]{3})([0-9]{1}\s?)([0-9]{4}\s?){3}([0-9]{3})\s?
GB[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([0-9]{4}\s?){3}([0-9]{2})\s?
VA[a-zA-Z0-9]{2}\s?([0-9]{3})([0-9]{1}\s?)([0-9]{4}\s?){3}([0-9]{2})\s?
VG[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{4}\s?){1}([0-9]{4}\s?){4}\s?
Original:
^[DE]{2}([0-9a-zA-Z]{20})$
Debuggex Demo
Modified:
^DE(?:\s*[0-9a-zA-Z]\s*){20}$
Debuggex Demo
This is the correct regex to match DE IBAN account numbers:
DE\d{2}[ ]\d{4}[ ]\d{4}[ ]\d{4}[ ]\d{4}[ ]\d{2}|DE\d{20}
Pass: DE89 3704 0044 0532 0130 00|||DE89370400440532013000
Fail: DE89-3704-0044-0532-0130-00
Most simple solution I can think of:
^DE(\s*[[:alnum:]]){20}\s*$
In particular, your initial [DE]{2} is wrong, as it allows 'DD', 'EE', 'ED' as well as the intended 'DE'.
To allow any amount of spaces anywhere:
^ *D *E( *[A-Za-z0-9]){20} *$
As you want to allow lower letters, also DE might be lower?
^ *[Dd] *[Ee]( *[A-Za-z0-9]){20} *$
^ matches the start of the string
$ end anchor
in between each characters there are optional spaces *
[character class] defines a set/range of characters
To allow at most one space in between each characters, replace the quantifier * (any amount of) with ? (0 or 1). If supported, \s shorthand can be used to match [ \t\r\n\f] instead of space only.
Test on regex101.com, also see the SO regex FAQ
Using Google Apps Script, I pasted Laurent's code from github into a script and added the following code to test.
// Use the Apps Script IDE's "Run" menu to execute this code.
// Then look at the View > Logs menu to see execution results.
function myFunction() {
//https://github.com/arhs/iban.js/blob/master/README.md
// var IBAN = require('iban');
var t1 = IBAN.isValid('hello world'); // false
var t2 = IBAN.isValid('BE68539007547034'); // true
var t3 = IBAN.isValid('BE68 5390 0754 7034'); // true
Logger.log("Test 1 = %s", t1);
Logger.log("Test 2 = %s", t2);
Logger.log("Test 3 = %s", t3);
}
The only thing needed to run the example code was commenting out the require('iban') line:
// var IBAN = require('iban');
Finally, instead of using client handlers to attempt a RegEx validation of IBAN input, I use a a server handler to do the validation.

Matlab Extracting sub string from cell array

I have a '3 x 1' cell array the contents of which appear like the following:
'ASDF_LE_NEWYORK Fixedafdfgd_ML'
'Majo_LE_WASHINGTON FixedMonuts_ML'
'Array_LE_dfgrt_fdhyuj_BERLIN Potato Price'
I want to be able to elegantly extract and create another '3x1' cell array with contents as:
'NEWYORK'
'WASHINGTON'
'BERLIN'
If you notice in above the NAME's are after the last underscore and before the first SPACE or '_ML'. How do I write such code in a concise manner.
Thanks
Edit:
Sorry guys I should have used a better example. I have it corrected now.
You can use lookbehind for _ and lookahead for space:
names = regexp(A, '(?<=_)[^\s_]*(?=\s)', 'match', 'once');
Where A is the cell array containing the strings:
A = {...
'ASDF_LE_NEWYORK Fixedafdfgd_ML'
'Majo_LE_WASHINGTON FixedMonuts_ML'
'Array_LE_dfgrt_fdhyuj_BERLIN Potato Price'};
>> names = regexp(A, '(?<=_)[^\s_]*(?=\s)', 'match', 'once')
names =
'NEWYORK'
'WASHINGTON'
'BERLIN'
NOTE: The question was changed, so the answer is no longer complete, but hopefully the regexp example is still useful.
Try regexp like this:
names = regexp(fullNamesCell,'_(NAME\d?)\s','tokens');
names = cellfun(#(x)(x{1}),names)
In the pattern _(NAME\d?)\s, the parenthesis define a subexpression, which will be returned as a token (a portion of matched text). The \d? specifies zero or one digits, but you could use \d{1} for exactly one digit or \d{1,3} if you expect between 1 and 3 digits. The \s specified whitespace.
The reorganization of names is a little convoluted, but when you use regexp with a cell input and tokens you get a cell of cells that needs some reformatting for your purposes.

Regex: How to match a string that is not only numbers

Is it possible to write a regular expression that matches all strings that does not only contain numbers? If we have these strings:
abc
a4c
4bc
ab4
123
It should match the four first, but not the last one. I have tried fiddling around in RegexBuddy with lookaheads and stuff, but I can't seem to figure it out.
(?!^\d+$)^.+$
This says lookahead for lines that do not contain all digits and match the entire line.
Unless I am missing something, I think the most concise regex is...
/\D/
...or in other words, is there a not-digit in the string?
jjnguy had it correct (if slightly redundant) in an earlier revision.
.*?[^0-9].*
#Chad, your regex,
\b.*[a-zA-Z]+.*\b
should probably allow for non letters (eg, punctuation) even though Svish's examples didn't include one. Svish's primary requirement was: not all be digits.
\b.*[^0-9]+.*\b
Then, you don't need the + in there since all you need is to guarantee 1 non-digit is in there (more might be in there as covered by the .* on the ends).
\b.*[^0-9].*\b
Next, you can do away with the \b on either end since these are unnecessary constraints (invoking reference to alphanum and _).
.*[^0-9].*
Finally, note that this last regex shows that the problem can be solved with just the basics, those basics which have existed for decades (eg, no need for the look-ahead feature). In English, the question was logically equivalent to simply asking that 1 counter-example character be found within a string.
We can test this regex in a browser by copying the following into the location bar, replacing the string "6576576i7567" with whatever you want to test.
javascript:alert(new String("6576576i7567").match(".*[^0-9].*"));
/^\d*[a-z][a-z\d]*$/
Or, case insensitive version:
/^\d*[a-z][a-z\d]*$/i
May be a digit at the beginning, then at least one letter, then letters or digits
Try this:
/^.*\D+.*$/
It returns true if there is any simbol, that is not a number. Works fine with all languages.
Since you said "match", not just validate, the following regex will match correctly
\b.*[a-zA-Z]+.*\b
Passing Tests:
abc
a4c
4bc
ab4
1b1
11b
b11
Failing Tests:
123
if you are trying to match worlds that have at least one letter but they are formed by numbers and letters (or just letters), this is what I have used:
(\d*[a-zA-Z]+\d*)+
If we want to restrict valid characters so that string can be made from a limited set of characters, try this:
(?!^\d+$)^[a-zA-Z0-9_-]{3,}$
or
(?!^\d+$)^[\w-]{3,}$
/\w+/:
Matches any letter, number or underscore. any word character
.*[^0-9]{1,}.*
Works fine for us.
We want to use the used answer, but it's not working within YANG model.
And the one I provided here is easy to understand and it's clear:
start and end could be any chars, but, but there must be at least one NON NUMERICAL characters, which is greatest.
I am using /^[0-9]*$/gm in my JavaScript code to see if string is only numbers. If yes then it should fail otherwise it will return the string.
Below is working code snippet with test cases:
function isValidURL(string) {
var res = string.match(/^[0-9]*$/gm);
if (res == null)
return string;
else
return "fail";
};
var testCase1 = "abc";
console.log(isValidURL(testCase1)); // abc
var testCase2 = "a4c";
console.log(isValidURL(testCase2)); // a4c
var testCase3 = "4bc";
console.log(isValidURL(testCase3)); // 4bc
var testCase4 = "ab4";
console.log(isValidURL(testCase4)); // ab4
var testCase5 = "123"; // fail here
console.log(isValidURL(testCase5));
I had to do something similar in MySQL and the following whilst over simplified seems to have worked for me:
where fieldname regexp ^[a-zA-Z0-9]+$
and fieldname NOT REGEXP ^[0-9]+$
This shows all fields that are alphabetical and alphanumeric but any fields that are just numeric are hidden. This seems to work.
example:
name1 - Displayed
name - Displayed
name2 - Displayed
name3 - Displayed
name4 - Displayed
n4ame - Displayed
324234234 - Not Displayed