RegExp to test Full Name not working as expected - regex

I'm writing a VBA function to evaluate whether a String is a Valid Full Name or not. For example:
Válid Full Name:
David Gilmour
Juan Munoz
Claudio Alberto da Silva
Invalid Full Name:
David Gilm01ur Jr.
Juan Muñoz
Cláudio Alberto da Silva
So the code of my function is this:
Function isVálidoNome(ByVal Texto As String) As Boolean
isVálidoNome = False
'
Dim strPattern As String: strPattern = "(^[a-zA-Z]+(\s?[a-zA-Z])*)*"
'Dim strPattern As String: strPattern = "\d"
Dim regularExpressions As New RegExp
'
regularExpressions.Pattern = strPattern
regularExpressions.Global = True
'
If (regularExpressions.Test(Texto)) Then
isVálidoNome = True
End If
End Function
The pattern I used (^[a-zA-Z]+(\s?[a-zA-Z])*)* works fine in an app I used to test it (RegexPal), but when I run the code in VBA, Strings with digits, accents returns true
Why this problem or Did I make any mistake?

You need to use
^[a-zA-Z]+(?:\s[a-zA-Z]+)*$
See the regex demo
Set regularExpressions.Global = False.
Details:
^ - start of string
[a-zA-Z]+ - 1 or more ASCII letters
(?: - start of a non-capturing group matching zero or more (*) sequences of:
\s - a single whitespace (add + after it to match 1 or more whitespaces)
[a-zA-Z]+
)* - end of the non-capturing group
$ - end of string.

Related

Regex: match string between two strings within an Excel Visiual Basic application (VBA) function (marco, module). (regular expression)

I do have this wonderful regular expression: (?<=, )(.*)(?= \(), which matches any characters between "," and "(".
For eg. from the following string it matches the highlighted text: "Hey man, my regex is Super (Mega) Cool (SC)". I tested in various regex testers (e.g. https://extendsclass.com/regex-tester.html#ruby).
However, when using it in an Excel VBA Module to create my own function, it does not work (see below).
Function extrCountryN(cellRef) As String
Dim RE As Object, MC As Object, M As Object
Dim sTemp As Variant
Const sPat As String = "((?<=, )(.*)(?= \())"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.MultiLine = True
.Pattern = sPat
If .Test(cellRef) Then
Set MC = .Execute(cellRef)
For Each M In MC
sTemp = sTemp & ", " & M.SubMatches(0)
Next M
End If
End With
extrCountryN = Mid(sTemp, 3)
End Function
'https://extendsclass.com/regex-tester.html
Trying similar regex in the same module works perfectly find for me, e.g. ^(.*?)(?= \() successfully matches everything before the first "(".
How to get it fixed?
As the VBA regex engine does not support lookbehind assertions, you can remove it and use a consuming pattern instead. It is simple in this case because you are actually only using the captured value (with M.SubMatches(0)) in your code.
So, the quick fix is
Const sPat As String = ", (.*)(?= \()"
If you need to deal with tabs or spaces, or any whitespace, you need \s rather than a literal space:
Const sPat As String = ",\s+(.*)(?=\s\()"
See this regex demo.
Details:
, - a comma
\s+ - one or more whitespaces
(.*) - Group 1: any zero or more chars other than line break chars as many as possible
(?=\s\() - a positive lookahead that matches a location that is immediately followed with a whitespace and ( char.
See the demo screenshot:

How to exclude an amount?

I have two strings with the same amount:
Price $22.00
Price Max=$22.00
Can someone please advise how I can modify this regex pattern to make sure that the price with a "Max" in front of it will be ignored?
(?:MAX=|MAX=\s)[$]?[0-9]{0,2}?[,]?[0-9]{1,3}[.][0-9]{0,2}
You may capture the MAX= into an optional capturing group and check if it matched when all matches are found. Only grab the value if the Group 1 did not match:
Dim strPattern As String: strPattern = "(MAX=\s*)?\$\d[\d.,]*"
Dim regEx As Object
Dim ms As Object, m As Object
Set regEx = CreateObject("VBScript.RegExp")
regEx.Global = True
regEx.Pattern = strPattern
Dim t As String
t = "Price $24.00 Price Max=$22.00 "
Set ms = regEx.Execute(t)
For Each m In ms
If Len(m.SubMatches(0)) = 0 Then
Debug.Print m.value
End If
Next
The (MAX=\s*)?\$\d[\d.,]* pattern matches MAX= and 0+ whitespaces into an optional group, it matches 1 or 0 times. \$\d[\d.,]* will match a digit and any 0+ digits, commas and dots. If Len(m.SubMatches(0)) = 0 Then will check if Group 1 is not empty, and if yes, the match is valid.
One way to do it could be to match what you don't want and to capture what you do want in a capturing group using an alternation:
Max=\s*\$[0-9]+\.[0-9]+|(\$[0-9]+\.[0-9]+)

UDF Regex - yyyy only

I am just learning some regex, and I need help spitting out matches generated by my regex code. I found some very useful resources here to output anything not matched, but I want to output only the parts of a cell that do match. I am looking for dates in cells, that may be a single yyyy date or yyyy-yy, or the like (as shown from the sample data below).
Sample data:
1951/52
1909-13
2005-2014
7 . (1989)-
1 (1933/34)-2 (1935/36)
1979-2012/2013
Current Function Code: (A snippet found from an existing post here, but returns the replacement value instead of what was matched)
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "([12][0-9]{3}[/][0-9]{2,4})|([12][0-9]{3}[-][0-9]{2,4})|([12][0-9]{3})"
You may use
\b[12][0-9]{3}(?:[,/-][0-9]{2,4})*\b
See the regex demo
Note that \b might be removed if you are not interested in a whole word search.
Pattern details:
\b - leading word boundary (the preceding char must be either a non-word char or the start of string)
[12][0-9]{3} - 1 or 2 followed with any 3 digits
(?:[,/-][0-9]{2,4})* - zero or more sequences ((?:...)*) of:
[,/-] - a ,, / or - characters
[0-9]{2,4} - any 2 to 4 digits
\b - trailing word boundary (there must be a non-word char or the end of string after).
Sample VBA code to grab all those values using RegExp#Execute:
Sub FetchDateLikeStrs()
Dim cellContents As String
Dim reg As regexp
Dim mc As MatchCollection
Dim m As match
Set reg = New regexp
reg.pattern = "\b[12][0-9]{3}(?:[,/-][0-9]{2,4})*\b"
reg.Global = True
cellContents = "1951/52 1909-13 2005-2014 7 . (1989)- 1 (1933/34)-2 (1935/36) 1979-2012/2013 1951,52"
Set mc = reg.Execute(cellContents)
For Each m In mc
Debug.Print m.Value
Next
End Sub

VBA Regex - Grab Hour HH:MM from String

Given a arbitary string I want to grab an hour (HH:MM) from the string.
Here is my regex:
^ # Start of string
(?: # Try to match...
(?: # Try to match...
([01]?\d|2[0-3]): # HH:
)? # (optionally).
([0-5]?\d): # MM: (required)
)? # (entire group optional, so either HH:MM:, MM: or nothing)
$ # End of string
And my code:
Public Sub RegexTest()
Dim oRegex As Object
Dim time_match As Object
Set oRegex = CreateObject("vbscript.regexp")
With oRegex
.Global = True
.Pattern = "^(?:(?:([01]?\d|2[0-3]):)?([0-5]?\d):)$" 'HH:MM
End With
Dim s As String: s = "START TIME: Mar. 3rd 2016 12:00am"
Set time_match = oRegex.Execute(s)
If time_match.Count = 1 Then
Debug.Print time_match.Matches(0)
Else
End If
End Sub
However I am unable to match here and get no output.
Your ^(?:(?:([01]?\d|2[0-3]):)?([0-5]?\d):)$ pattern only matches a full string that starts with an optional HH: part, and and obligatory MM part followed with an obligatory :.
I suggest
(?:[01]?\d|2[0-3]):[0-5]\d
Since you are matching a part of the string.
See regex demo

Regex lookahead to match everything prior to 1st OR 2nd group of digits

Regex in VBA.
I am using the following regex to match the second occurance of a 4-digit group, or the first group if there is only one group:
\b\d{4}\b(?!.+\b\d{4}\b)
Now I need to do kind of the opposite: I need to match everything up until the second occurance of a 4-digit group, or up until the first group if there is only one. If there are no 4-digit groups, capture the entire string.
This would be sufficient.
But there is also a preferable "bonus" route: If there exists a way to match everything up until a 4-digit group that is optionally followed by some random text, but only if there is no other 4-digit group following it. If there exists a second group of 4 digits, capture everything up until that group (including the first group and periods, but not commas). If there are no groups, capture everything. If the line starts with a 4-digit group, capture nothing.
I understand that also this could (should?) be done with a lookahead, but I am not having any luck in figuring out how they work for this purpose.
Examples:
Input: String.String String 4444
Capture: String.String String 4444
Input: String4444 8888 String
Capture: String4444
Input: String String 444 . B, 8888
Capture: String String 444 . B
Bonus case:
Input: 8888 String
Capture:
for up until the second occurrence of a 4-digit group, or up until the first group if there is only one use this pattern
^((?:.*?\d{4})?.*?)(?=\s*\b\d{4}\b)
Demo
per comment below, use this pattern
^((?:.*?\d{4})?.*?(?=\s*\b\d{4}\b)|.*)
Demo
You can use this regex in VBA to capture lines with 4-digit numbers, or those that do not have 4-digit numbers in them:
^((?:.*?[0-9]{4})?.*?(?=\s*?[0-9]{4})|(?!.*[0-9]{4}).*)
See demo, it should work the same in VBA.
The regex consists of 2 alternatives: (?:.*?[0-9]{4})?.*?(?=\s*?[0-9]{4}) and (?!.*[0-9]{4}).*.
(?:.*?[0-9]{4})?.*?(?=\s*?[0-9]{4}) matches 0 or more (as few as possible) characters that are preceded by 0 or 1 sequence of characters followed by a 4-digit number, and are followed by optional space(s) and 4 digit number.
(?!.*[0-9]{4}).* matches any number of any characters that do not have a 4-digit number inside.
Note that to only match whole numbers (not part of other words) you need to add \b around the [0-9]{4} patterns (i.e. \b[0-9]{4}\b).
Matches everything except spaces till last occurace of a 4 digit word
You can use the following:
(?:(?! ).)+(?=.*\b\d{4}\b)
See DEMO
For your basic case (marked by you as sufficient), this will work:
((?:(?!\d{4}).)*(?:\d{4})?(?:(?!\d{4}).)*)(?=\d{4})
You can pad every \d{4} internally with \b if you need to.
See a demo here.
If anyone is interested, I cheated to fully solve my problem.
Building on this answer, which solves the vast majority of my data set, I used program logic to catch some rarely seen use-cases. It seemed difficult to get a single regex to cover all the situations, so this seems like a viable alternative.
Problem is illustrated here.
The code isn't bulletproof yet, but this is the gist:
Function cRegEx (str As String) As String
Dim rExp As Object, rMatch As Object, regP As String, strL() As String
regP = "^((?:.*?[0-9]{4})?.*?(?:(?=\s*[0-9]{4})|(?:(?!\d{4}).)*)|(?!.*[0-9]{4}).*)"
' Encountered two use-cases that weren't easily solvable with regex, due to the already complex pattern(s).
' Split str if we encounter a comma and only keep the first part - this way we don't have to solve this case in the regex.
If InStr(str, ",") <> 0 Then
strL = Split(str, ",")
str = strL(0)
End If
' If str starts with a 4-digit group, return an empty string.
If cRegExNum(str) = False Then
Set rExp = CreateObject("vbscript.regexp")
With rExp
.Global = False
.MultiLine = False
.IgnoreCase = True
.Pattern = regP
End With
Set rMatch = rExp.Execute(str)
If rMatch.Count > 0 Then
cRegEx = rMatch(0)
Else
cRegEx = ""
End If
Else
cRegEx = ""
End If
End Function
Function cRegExNum (str As String) As Boolean
' Does the string start with 4 non-whitespaced integers?
' Return true if it does
Dim rExp As Object, rMatch As Object, regP As String
regP = "^\d{4}"
Set rExp = CreateObject("vbscript.regexp")
With rExp
.Global = False
.MultiLine = False
.IgnoreCase = True
.Pattern = regP
End With
Set rMatch = rExp.Execute(str)
If rMatch.Count > 0 Then
cRegExNum = True
Else
cRegExNum = False
End If
End Function