Delphi multiline regex - regex

I have some non-regression test code in Delphi that calls an external diff tool. Then my code loads the diff results and should remove acceptable differences, such as dates in the compared results. I'm trying to do this with a multiline TRegEx.Replace , but no match is found ...
https://regex101.com/r/QBZuws/2 shows the pattern I came up with and a sample test diff file. I need to delete the matching "pararaphs" of 3 lines
Here is my code :
function FilterDiff(AText:string):string;
var
LStr:string;
Regex: TRegEx;
begin
// AText:=StringReplace(AText,#13+#10,'\n',[rfReplaceAll]); // doesn't help ...
LStr := '\d\d.\d\d.20\d\d \d\d:\d\d:\d\d'; // regex for date and time
LStr := '##.*##\n-'+LStr+'\n\+'+LStr; // regex for paragraphs to remove
Regex := TRegEx.Create(LStr, [roMultiLine]);
Result := Regex.Replace(AText,'');
end;
procedure TReportTest.NonRegression;
var
LDiff : TStringList;
// others removed for clarity
begin
// removed section code that call an external tool and produces diff.txt file
LDiff := TStringList.Create;
LDiff.LoadFromFile('diff.txt');
Status(FilterDiff(LDiff.Text)); // show the diffs in DUnit GUI for now
LDiff.Free;
end;
Besides, while tracing TRegEx.Replace down to
System.RegularExpressionsAPI.pcre_exec($4D72A50,nil,'--- '#$D#$A'+++ '#$D#$A'## -86 +86 ##'#$D#$A'-16.11.2017 15:00:36'#$D#$A'+15.11.2017 10:47:58'#$D#$A'## -400 +400 ##'#$D#$A'-16.11.2017 15:00:36'#$D#$A'+15.11.2017 10:47:58'#$D#$A,132,0,1024,$7D56800,300)
System.RegularExpressionsCore.TPerlRegEx.Match
System.RegularExpressionsCore.TPerlRegEx.ReplaceAll
System.RegularExpressions.TRegEx.Replace(???,???)
TestReportAuto.FilterDiff('--- '#$D#$A'+++ '#$D#$A'## -86 +86 ##'#$D#$A'-16.11.2017 15:00:36'#$D#$A'+15.11.2017 10:47:58'#$D#$A'## -400 +400 ##'#$D#$A'-16.11.2017 15:00:36'#$D#$A'+15.11.2017 10:47:58'#$D#$A)
I was surprised to see quotes before and after each newline #$D#$A in the debugger, but they don't look "real" ... or are they ?

As you seem to have issues with different kinds of line breaks, I would recommend to adjust your Regex to use \R instead of \n which matches Windows style linebreaks (CR + LF) as well as Unix style linebreaks (LF).

Well, I just noticed the \n in regex matches only LF, not CR+LF, so I added
AText:=StringReplace(AText,#13+#10,#10,[rfReplaceAll]); // \n matches only LF !
at the beginning of my function and it's much better now...
Sometimes writing down a problem helps ...

Related

regex pattern works in online tool, parses in NSRegularExpression, but fails to match anything

I am trying to match roman numerals from test strings like:
Series Name.disk_V.Episode_XI.Episode_name.avi
Series Name.Season V.Episode XI.Part XXV.Episode_name.avi
and a real-world example in which the XIII should not match:
XIII: The Series season II episode V.mp4
Following the logic in this fantastic thread and many experiments in an online regex debugger I came up with this:
(?<=d|dvd|disc|disk|s|se|season|e|ep|episode)[\s._-]\KM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})(?=[\s._-])
The last example returns two matches, "II" and "V", ignoring the XIII in the name part. Yay!
So then I tried it in a Swift playground:
let file = "Series Name.disk_V.Episode_XI.Episode_name.avi"
let p = #"(?<=d|dvd|disc|disk|s|se|season|e|ep|episode)[\s._-]\KM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})(?=[\s._-])"#
let r = try NSRegularExpression(pattern: p, options: [.caseInsensitive])
let nsString = file as NSString
let results = r.matches(in: suggestion, options: [], range: NSMakeRange(0, nsString.length))
The pattern parses without error but returns no matches. I found that it works if I remove the \K, although that leaves the leading separator in the match. According to this thread, Obj-C (which I assume means NSRegex) supports \K, so I'm not sure why this fails.
There are a number of similar-sounding threads here on SO, but they invariably have to do with patterns that fail to parse, mostly due to escaping. This is not the case here, it parses fine and I can see the pattern is correct (ie, no double-slashes) if you print(r). It just doesn't match.
Can anyone offer some insight or an alternative regex that does not use \K?
TheFourthBird's idea is the solution. I modified the pattern by removing the \K and making the entire roman section a named group:
(?<=d|dvd|disc|disk|s|se|season|e|ep|episode)[\s._-](?<roman>M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3}))(?=[\s._-])
To parse it, everything as above to start but then look for the matching items like this:
for result in results {
let nameRange = result.range(withName: "roman")
print(nsString.substring(with: nameRange))
}
Output:
V
XI
Bingo!

REGEX - Automatic text selection and restructering

I am kinda new to AHK, I've written some scripts. But with my latest script, I'm kind of stuck with REGEX in AHK.
I want to make the report of a structure of texts I make.
To do this I've set up a system:
sentences ending on a '.', are the important sentences with "-". (variable 'Vimportant') BUT WITHOUT the words mentioned for 'Vanecdotes2' or 'Vdelete2' cfr. 4
sentences ending on a '.*', are the anecdotes (variable 'Vanecdotes1') where I've put a star manualy after the point.
sentences ending on a '.!', are irrelevant sentences and need to be deleted (variable 'Vdelete1') were I've put a star manually after the point.
an extra option I want to implement are words to detect in a sentence so that the sentence will be automatically added to the variable 'Vanecdotes2' or 'Vdelete2'
An random example would be this (I already have put ! and * after the sentence (why is not important) and of which "acquisition" is an example op Vanecdotes2 of my point 4 above):
Last procedure on 19/8/2019.
Normal structure x1.!
Normal structure x2.!
Abberant structure x3, needs follow-up within 2 months.
Structure x4 is lower in activity, but still above p25.
Abberant structure x4, needs follow-up within 6 weeks.
Normal structure x5.
Good aqcuisition of x6.
So the output of the Regex in the variables should be
Last procedure on 19/8/2019.
Normal structure x1.! --> regex '.!' --> Vdelete1
Normal structure x2.! --> regex '.!' --> Vdelete1
Abberant structure x3, needs follow-up within 2 months. --> Regex '.' = Vimportant
Structure x4 is lower in activity, but still above p25.* --> regex '.*' = Vanecdote1
Abberant structure x4, needs follow-up within 6 weeks. --> Regex '.' = Vimportant
Normal structure x5.! --> regex '.!' --> Vdelete1
Good aqcuisition of x6. --> Regex 'sentence with the word acquisition' = Vanecdote2
And the output should be:
'- Last procedure on 19/8/2019.
- Abberant structure x3, needs follow-up within 2 months.
- Abberant structure x4, needs follow-up within 6 weeks.
. Structure x4 is lower inactivity, but still above p25.
. Good aqcuisition of x6.
But I have been having a lot of trouble with the regex, especialy with the selection of sentences ending on a * or !. But also with the exclusion criteria, they just don't want to do it.
Because AHT doesn't have a real good tester, I first tested it in another regex tester and I was planning to 'translate' it later on to AHK code.. but it just doesn't work. (so I know in the script below I'm using AHK language with nonAHK regex, but I've just put the to together for illustration)
This is what i have now:
Send ^c
clipwait, 1000
Temp := Clipboard
Regexmatch(Temp, "^.*[.]\n(?!^.*\(Anecdoteword1|Anecdoteword2|deletewordX|deletewordY)\b.*$)", Vimportant)
Regexmatch(Temp, "^.*[.][*]\n")", Vanecdotes1)
Regexmatch(Temp, "^.*[.][!]\n")", Vdelete1)
Regexmatch(Temp, "^.*\b(Anecdoteword1|Anecdoteword2)\b.*$")", Vanecdotes2)
Regexmatch(Temp, "^.*\b(deletewordX|deletewordY)\b.*$")", Vdelete2)
Vanecdotes_tot := Vanecdotes1 . Vanecdotes2
Vdelete_tot := Vdelete1 . Vdelete2
Vanecdotes_ster := "* " . StrReplace(Vanecdotes_tot, "`r`n", "`r`n* ")
Vimportant_stripe := "- " . StrReplace(Vimportant, "`r`n", "`r`n- ")
Vresult := Vimportant_stripe . "`n`n" . Vanecdotes_ster
For "translation to AHK" I tried to make ^.*\*'n from the working (non ahk) regex ^.*[.][*]\n.
There isn't really such a thing as AHK regex. AHK pretty much uses PCRE, apart from the options.
So don't try to turn a linefeed \n into an AHK linefeed `n.
And there seem to be some syntax errors in your regexes. Not quite sure what those extra ") in there are supposed to be. Also, instead of using [.][*], you're supposed to use \.\*. The \ is required with those specific characters to escape their normal functionality (any character and match between zero and unlimited).
[] is to match any character in that group, like if you wanted to match either . or * you'd do [.*].
And seems like you got the idea of using capture groups, but just in case, here's a minimal example about them:
RegexMatch("TestTest1233334Test", "(\d+)", capture)
MsgBox, % capture
And lastly, about your approach to the problem, I'd recommend looping through the input line by line. It'll be much better/easier. Use e.g LoopParse.
Minimal example for it as well:
inp := "
(
this is
a multiline
textblock
we're going
to loop
through it
line by line
)"
Loop, Parse, inp, `n, `r
MsgBox, % "Line " A_Index ":`n" A_LoopField
Hope this was of help.
This i were i al up till now, nothing works (i will try the suggested loop when Regex is working): ^m::
BlockInput, On
MouseGetPos, , ,TempID, control
WinActivate, ahk_id %TempID%
if WinActive("Pt.")
Send ^c
clipwait, 1000
Temp := Clipboard
Regexmatch(Temp, "(^(?:..\n)((?! PAX|PAC|Normaal|Geen).)$)", Vimportant)
Vimportant := Vimportant.1
Regexmatch(Temp, "(^..*\n)", Vanecdotes1_ster)
Regexmatch(Temp, "(^..!\n)" , Vdelete1_uitroep)
Regexmatch(Temp, "(^.\b(PAX|PAC)\b.$)", Vanecdotes2)
Regexmatch(Temp, "(^.\b(Normaal|Geen)\b.$)", Vdelete2)
Vanecdotes1 := StrReplace(Vanecdotes1_ster, ".", ".")
Vdelete1 := StrReplace(Vdelete1_uitroep, ".!", ".")
Vanecdotes_tot := Vanecdotes1 . Vanecdotes2
Vdelete_tot := Vdelete1 . Vdelete2
Vanecdotes_ster := " " . StrReplace(Vanecdotes_tot, "rn", "rn* ")
Vimportant_stripe := "- " . StrReplace(Vimportant, "rn", "rn- ")
Vresult := Vimportant_stripe . "nn" . Vanecdotes_ster
Clipboard := Vresult
Send ^v
return

Regex to insert space with certain characters but avoid date and time

I made a regex which inserts a space where ever there is any of the characters
-:\*_/;, present for example JET*AIRWAYS\INDIA/858701/IDBI 05/05/05;05:05:05 a/c should beJET* AIRWAYS\ INDIA/ 858701/ IDBI 05/05/05; 05:05:05 a/c
The regex I used is (?!a\/c|w\/d|m\/s|s\/w|m\/o)(\D-|\D:|\D\*|\D_|\D\\|\D\/|\D\;)
I have added some words exceptions like a/c w/d etc. \D conditions given to avoid date/time values getting separated, but this created an issue, the numbers followed by the above mentioned characters never get split.
My requirement is
1. Insert a space after characters -:\*_/;,
2. but date and time should not get split which may have / :
3. need exception on words like a/c w/d
The following is the full code
Private Function formatColon(oldString As String) As String
Dim reg As New RegExp: reg.Global = True: reg.Pattern = "(?!a\/c|w\/d|m\/s|s\/w|m\/o)(\D-|\D:|\D\*|\D_|\D\\|\D\/|\D\;)" '"(\D:|\D/|\D-|^w/d)"
Dim newString As String: newString = reg.Replace(oldString, "$1 ")
formatColon = XtraspaceKill(newString)
End Function
I would use 3 replacements.
Replace all date and time special characters with a special macro that should never be found in your text, e.g. for 05/15/2018 4:06 PM, something based on your name:
05MANUMOHANSLASH15MANUMOHANSLASH2018 4MANUMOHANCOLON06 PM
You can encode exceptions too, like this:
aMANUMOHANSLASHc
Now run your original regex to replace all special characters.
Finally, unreplace the macros MANUMOHANSLASH and MANUMOHANCOLON.
Meanwhile, let me tell you why this is complicated in a single regex.
If trying to do this in a single regex, you have to ask, for each / or :, "Am I a part of a date or time?"
To answer that, you need to use lookahead and lookbehind assertions, the latter of which Microsoft has finally added support for.
But given a /, you don't know if you're between the first and second, or second and third parts of the date. Similar for time.
The number of cases you need to consider will render your regex unmaintainably complex.
So please just use a few separate replacements :-)

Select two ranges, one immediately after another using regular expressions [duplicate]

I have a large log file, and I want to extract a multi-line string between two strings: start and end.
The following is sample from the inputfile:
start spam
start rubbish
start wait for it...
profit!
here end
start garbage
start second match
win. end
The desired solution should print:
start wait for it...
profit!
here end
start second match
win. end
I tried a simple regex but it returned everything from start spam. How should this be done?
Edit: Additional info on real-life computational complexity:
actual file size: 2GB
occurrences of 'start': ~ 12 M, evenly distributed
occurences of 'end': ~800, near the end of the file.
This regex should match what you want:
(start((?!start).)*?end)
Use re.findall method and single-line modifier re.S to get all the occurences in a multi-line string:
re.findall('(start((?!start).)*?end)', text, re.S)
See a test here.
Do it with code - basic state machine:
open = False
tmp = []
for ln in fi:
if 'start' in ln:
if open:
tmp = []
else:
open = True
if open:
tmp.append(ln)
if 'end' in ln:
open = False
for x in tmp:
print x
tmp = []
This is tricky to do because by default, the re module does not look at overlapping matches. Newer versions of Python have a new regex module that allows for overlapping matches.
https://pypi.python.org/pypi/regex
You'd want to use something like
regex.findall(pattern, string, overlapped=True)
If you're stuck with Python 2.x or something else that doesn't have regex, it's still possible with some trickery. One brilliant person solved it here:
Python regex find all overlapping matches?
Once you have all possible overlapping (non-greedy, I imagine) matches, just determine which one is shortest, which should be easy.
You could do (?s)start.*?(?=end|start)(?:end)?, then filter out everything not ending in "end".

Regex for IBAN allowing for white spaces AND checking for exact length

I need to check an input field for a German IBAN. The user should be allowed to leave in white spaces and input should be validated to have a starting DE and then exact 20 characters numbers and letters.
Without the white space allowance, I tried
^[DE]{2}([0-9a-zA-Z]{20})$
but I cannot find where and how I can add "white spaces anywhere allowed.
This should be simple, but I simply cannot find a solution.
Thanks for help!
Because you should use the right tool for the right task: you should not rely on regexps to validate IBAN numbers, but instead use the IBAN checksum algorithm to check the whole code is actually correct, making any regexp superfluous and redundant. i.e.: remove all spaces, rearrange the code, convert to integers, and compute remainder, here it's best explained.
Though, there am I trying to answer your question, for the fun of it:
what about:
^DE([0-9a-zA-Z]\s?){20}$
which only difference is allowing a whitespace (or not) after each occurence of a alphanumeric character.
here is the visualization:
edit: for the OP's information, the only difference is that this regexp, from #ulugbex-umirov: (?:\s*[0-9a-zA-Z]\s*) does a lookahead check to see if there's a space between the iso country code and the checksum (which only made of numerical digits), which I do not support on purpose.
And actually to support a correct IBAN syntax, which is formed of groups of 4 characters, as the wikipedia page says:
^DE\d{2}\s?([0-9a-zA-Z]{4}\s?){4}[0-9a-zA-Z]{2}$
example
If your UI is in Javascript, you can use that library for doing IBAN validation:
<script src="iban.js"></script>
<script>
// the API is now accessible from the window.IBAN global object
IBAN.isValid('hello world'); // false
IBAN.isValid('BE68539007547034'); // true
</script>
so you know this is a valid IBAN, and can validate it before the data is ever even sent to the backend. Simpler, lighter and more elegant… Why do something else?
Here is a list of IBANs from 70 Countries. I generated it with a python script i wrote based on this https://en.wikipedia.org/wiki/International_Bank_Account_Number
AL[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){2}([a-zA-Z0-9]{4}\s?){4}\s?
AD[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){2}([a-zA-Z0-9]{4}\s?){3}\s?
AT[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}\s?
AZ[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{4}\s?){1}([0-9]{4}\s?){5}\s?
BH[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([a-zA-Z0-9]{4}\s?){3}([a-zA-Z0-9]{2})\s?
BY[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{4}\s?){1}([0-9]{4}\s?){5}\s?
BE[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){3}\s?
BA[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}\s?
BR[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}([0-9]{3})([a-zA-Z]{1}\s?)([a-zA-Z0-9]{1})\s?
BG[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([0-9]{4}\s?){1}([0-9]{2})([a-zA-Z0-9]{2}\s?)([a-zA-Z0-9]{4}\s?){1}([a-zA-Z0-9]{2})\s?
CR[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}([0-9]{2})\s?
HR[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}([0-9]{1})\s?
CY[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){2}([a-zA-Z0-9]{4}\s?){4}\s?
CZ[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}\s?
DK[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){3}([0-9]{2})\s?
DO[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([0-9]{4}\s?){5}\s?
TL[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}([0-9]{3})\s?
EE[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}\s?
FO[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){3}([0-9]{2})\s?
FI[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){3}([0-9]{2})\s?
FR[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){2}([0-9]{2})([a-zA-Z0-9]{2}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{1})([0-9]{2})\s?
GE[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{2})([0-9]{2}\s?)([0-9]{4}\s?){3}([0-9]{2})\s?
DE[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}([0-9]{2})\s?
GI[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([a-zA-Z0-9]{4}\s?){3}([a-zA-Z0-9]{3})\s?
GR[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){1}([0-9]{3})([a-zA-Z0-9]{1}\s?)([a-zA-Z0-9]{4}\s?){3}([a-zA-Z0-9]{3})\s?
GL[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){3}([0-9]{2})\s?
GT[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{4}\s?){1}([a-zA-Z0-9]{4}\s?){5}\s?
HU[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){6}\s?
IS[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}([0-9]{2})\s?
IE[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{4}\s?){1}([0-9]{4}\s?){3}([0-9]{2})\s?
IL[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}([0-9]{3})\s?
IT[a-zA-Z0-9]{2}\s?([a-zA-Z]{1})([0-9]{3}\s?)([0-9]{4}\s?){1}([0-9]{3})([a-zA-Z0-9]{1}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{3})\s?
JO[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([0-9]{4}\s?){5}([0-9]{2})\s?
KZ[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){3}([0-9]{1})([a-zA-Z0-9]{3}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{2})\s?
XK[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){1}([0-9]{4}\s?){2}([0-9]{2})([0-9]{2}\s?)\s?
KW[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([a-zA-Z0-9]{4}\s?){5}([a-zA-Z0-9]{2})\s?
LV[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([a-zA-Z0-9]{4}\s?){3}([a-zA-Z0-9]{1})\s?
LB[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){1}([a-zA-Z0-9]{4}\s?){5}\s?
LI[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){1}([0-9]{1})([a-zA-Z0-9]{3}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{1})\s?
LT[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}\s?
LU[a-zA-Z0-9]{2}\s?([0-9]{3})([a-zA-Z0-9]{1}\s?)([a-zA-Z0-9]{4}\s?){3}\s?
MK[a-zA-Z0-9]{2}\s?([0-9]{3})([a-zA-Z0-9]{1}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{1})([0-9]{2})\s?
MT[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([0-9]{4}\s?){1}([0-9]{1})([a-zA-Z0-9]{3}\s?)([a-zA-Z0-9]{4}\s?){3}([a-zA-Z0-9]{3})\s?
MR[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}([0-9]{3})\s?
MU[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([0-9]{4}\s?){4}([0-9]{3})([a-zA-Z]{1}\s?)([a-zA-Z]{2})\s?
MC[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){2}([0-9]{2})([a-zA-Z0-9]{2}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{1})([0-9]{2})\s?
MD[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{2})([a-zA-Z0-9]{2}\s?)([a-zA-Z0-9]{4}\s?){4}\s?
ME[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}([0-9]{2})\s?
NL[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([0-9]{4}\s?){2}([0-9]{2})\s?
NO[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){2}([0-9]{3})\s?
PK[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{4}\s?){1}([0-9]{4}\s?){4}\s?
PS[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{4}\s?){1}([0-9]{4}\s?){5}([0-9]{1})\s?
PL[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){6}\s?
PT[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}([0-9]{1})\s?
QA[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([a-zA-Z0-9]{4}\s?){5}([a-zA-Z0-9]{1})\s?
RO[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([a-zA-Z0-9]{4}\s?){4}\s?
SM[a-zA-Z0-9]{2}\s?([a-zA-Z]{1})([0-9]{3}\s?)([0-9]{4}\s?){1}([0-9]{3})([a-zA-Z0-9]{1}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{3})\s?
SA[a-zA-Z0-9]{2}\s?([0-9]{2})([a-zA-Z0-9]{2}\s?)([a-zA-Z0-9]{4}\s?){4}\s?
RS[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){4}([0-9]{2})\s?
SK[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}\s?
SI[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){3}([0-9]{3})\s?
ES[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}\s?
SE[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}\s?
CH[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){1}([0-9]{1})([a-zA-Z0-9]{3}\s?)([a-zA-Z0-9]{4}\s?){2}([a-zA-Z0-9]{1})\s?
TN[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){5}\s?
TR[a-zA-Z0-9]{2}\s?([0-9]{4}\s?){1}([0-9]{1})([a-zA-Z0-9]{3}\s?)([a-zA-Z0-9]{4}\s?){3}([a-zA-Z0-9]{2})\s?
AE[a-zA-Z0-9]{2}\s?([0-9]{3})([0-9]{1}\s?)([0-9]{4}\s?){3}([0-9]{3})\s?
GB[a-zA-Z0-9]{2}\s?([a-zA-Z]{4}\s?){1}([0-9]{4}\s?){3}([0-9]{2})\s?
VA[a-zA-Z0-9]{2}\s?([0-9]{3})([0-9]{1}\s?)([0-9]{4}\s?){3}([0-9]{2})\s?
VG[a-zA-Z0-9]{2}\s?([a-zA-Z0-9]{4}\s?){1}([0-9]{4}\s?){4}\s?
Original:
^[DE]{2}([0-9a-zA-Z]{20})$
Debuggex Demo
Modified:
^DE(?:\s*[0-9a-zA-Z]\s*){20}$
Debuggex Demo
This is the correct regex to match DE IBAN account numbers:
DE\d{2}[ ]\d{4}[ ]\d{4}[ ]\d{4}[ ]\d{4}[ ]\d{2}|DE\d{20}
Pass: DE89 3704 0044 0532 0130 00|||DE89370400440532013000
Fail: DE89-3704-0044-0532-0130-00
Most simple solution I can think of:
^DE(\s*[[:alnum:]]){20}\s*$
In particular, your initial [DE]{2} is wrong, as it allows 'DD', 'EE', 'ED' as well as the intended 'DE'.
To allow any amount of spaces anywhere:
^ *D *E( *[A-Za-z0-9]){20} *$
As you want to allow lower letters, also DE might be lower?
^ *[Dd] *[Ee]( *[A-Za-z0-9]){20} *$
^ matches the start of the string
$ end anchor
in between each characters there are optional spaces *
[character class] defines a set/range of characters
To allow at most one space in between each characters, replace the quantifier * (any amount of) with ? (0 or 1). If supported, \s shorthand can be used to match [ \t\r\n\f] instead of space only.
Test on regex101.com, also see the SO regex FAQ
Using Google Apps Script, I pasted Laurent's code from github into a script and added the following code to test.
// Use the Apps Script IDE's "Run" menu to execute this code.
// Then look at the View > Logs menu to see execution results.
function myFunction() {
//https://github.com/arhs/iban.js/blob/master/README.md
// var IBAN = require('iban');
var t1 = IBAN.isValid('hello world'); // false
var t2 = IBAN.isValid('BE68539007547034'); // true
var t3 = IBAN.isValid('BE68 5390 0754 7034'); // true
Logger.log("Test 1 = %s", t1);
Logger.log("Test 2 = %s", t2);
Logger.log("Test 3 = %s", t3);
}
The only thing needed to run the example code was commenting out the require('iban') line:
// var IBAN = require('iban');
Finally, instead of using client handlers to attempt a RegEx validation of IBAN input, I use a a server handler to do the validation.