VBA: Submatching regex - regex

I have the following code:
Dim results(1) As String
Dim RE As Object, REMatches As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.MultiLine = False
.Global = True
.IgnoreCase = True
.Pattern = "(.*?)(\[(.*)\])?"
End With
Set REMatches = RE.Execute(str)
results(0) = REMatches(0).submatches(0)
results(1) = REMatches(0).submatches(2)
Basically if I pass in a string "Test" I want it to return an array where the first element is Test and the second element is blank.
If I pass in a string "Test [bar]", the first element should be "Test " and the second element should be "bar".
I can't seem to find any issues with my regex. What am I doing wrong?

You need to add beginning and end of string anchors to your regex:
...
.Pattern = "^(.*?)(\[(.*)\])?$"
...
Without these anchors, the .*? will always match zero characters and since your group is optional it will never try to backtrack and match more.

Related

VBA regex - Value used in formula is of the wrong data type

I can't seem to figure out why this function which includes a regex keeps returning an error of wrong data type? I'm trying to return a match to the identified pattern from a file path string in an excel document. An example of the pattern I'm looking for is "02 Package_2018-1011" from a sample string "H:\H1801100 MLK Middle School Hartford\2-Archive! Issued Bid Packages\01 Package_2018-0905 Demolition and Abatement Bid Set_Drawings - PDF\00 HazMat\HM-1.pdf". Copy of the VBA code is listed below.
Function textpart(Myrange As Range) As Variant
Dim strInput As String
Dim regex As Object
Set regex = CreateObject("VBScript.RegExp")
strInput = Myrange.Value
With regex
.Pattern = "\D{2}\sPackage_\d{4}-\d{4}"
.Global = True
End With
Set textpart = regex.Execute(strInput)
End Function
You need to use \d{2} to match 2-digit chunk, not \D{2}. Besides, you are trying to assign the whole match collection to the function result, while you should extract the first match value and assign that value to the function result:
Function textpart(Myrange As Range) As Variant
Dim strInput As String
Dim regex As Object
Dim matches As Object
Set regex = CreateObject("VBScript.RegExp")
strInput = Myrange.Value
With regex
.Pattern = "\d{2}\sPackage_\d{4}-\d{4}"
End With
Set matches = regex.Execute(strInput)
If matches.Count > 0 Then
textpart = matches(0).Value
End If
End Function
Note that to match it as a whole word you may add word boundaries:
.Pattern = "\b\d{2}\sPackage_\d{4}-\d{4}\b"
^^ ^^
To only match it after \, you may use a capturing group:
.Pattern = "\\(\d{2}\sPackage_\d{4}-\d{4})\b"
' ...
' and then
' ...
textpart = matches(0).Submatches(0)

Remove vowels,white spaces and duplicate characters

I'm trying to trim a string and remove any vowel and white space and duplicate characters.
Here's the code I'm using
Function TrimString(strString As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.IgnoreCase = True
.Pattern = "[aeiou\s]"
TrimString = .Replace(strString, vbNullString)
.Pattern = "(.)\1+"
TrimString = .Replace(TrimString, "$1")
End With
End Function
Is there a way to combine both patterns instead of doing this in 2 steps?
Thank you in advance.
This would work:
With objRegex
.Global = True
.IgnoreCase = True
.Pattern = ".*?([^aeiou\s]).*?"
TrimString = .Replace(TrimString, "$1$1")
End With
I'm not familiar with VBA but if there is a way to just return matches instead of replacing them in the original string then you could use the following pattern
[^aeiou\s]
And return this:
$&$&
You have two replacements:
Removing [aeiou\s] matches, e.g. niarararrrrreghtatt turns into nrrrrrrrghttt
Replacing each chunk of identical chars with the first occurrence turns nrrrrrrrghttt into nrght.
That means, you need to match the first pattern both as a separate alternative and as a "filler" between the identical chars.
The pattern you may use is
.pattern = "[aeiou\s]+|([^aeiou\s])(?:[aeiou\s]*\1)+"
TrimString = .Replace(strString, "$1")
See this regex demo.
Details
[aeiou\s]+ - 1+ vowels or whitespace chars
| - or
([^aeiou\s]) - Capturing group 1: any char other than a vowel or a whitespace char
(?:[aeiou\s]*\1)+ - 1 or more occurrences of:
[aeiou\s]* - 0+ vowel or whitespace chars
\1 - backreference to Group 1, its value
Note that . is changed into [^aeiou\s] since the opposite has already been handled with the first alternation branch.

Regex - Return Exact Number of Consecutive Digits

I want to return 5 consecutive digits from a string (working in VBA).
Based on this post Regex I'm using the pattern [^\d]\d{5}[^\d], but this picks up the single letters immediately before and after the targeted 5 digits and returns h92345W(from "....South92345West").
How can I modify to return only the 5 consecutive digits: 92345
Sub RegexTest()
Dim strInput As String
Dim strPattern As String
strInput = "9129 Nor22 999123456 South92345West"
'strPattern = "^\d{5}$" 'No match
strPattern = "[^\d]\d{5}[^\d]" 'Returns additional letter before and after digits
'In this case returns: "h12345W"
MsgBox RegxFunc(strInput, strPattern)
End Sub
Function RegxFunc(strInput As String, regexPattern As String) As String
Dim regEx As New RegExp
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = regexPattern
End With
If regEx.test(strInput) Then
Set matches = regEx.Execute(strInput)
RegxFunc = matches(0).Value
Else
RegxFunc = "not matched"
End If
End Function
([^\d])(\d{5})([^\d])
You can use this regex, the matched terms should be in the 2nd group
You need to use a group:
"[^\d](\d{5})[^\d]"
And then the number will be in the first group. Not sure about the VBA syntax for grouping.

Using Regular expression in VBA

This is my sample record in a Text format with comma delimited
901,BLL,,,BQ,ARCTICA,,,,
i need to replace ,,, to ,,
The Regular expression that i tried
With regex
.MultiLine = False
.Global = True
.IgnoreCase = False
.Pattern="^(?=[A-Z]{3})\\,{3,}",",,"))$ -- error
Now i want to pass Line from file to Regex to correct the record, can some body guide me to fix this i am very new to VBA
I want to read the file line by line pass it to Regex
Looking at your original pattern I tried using .Pattern = "^\d{3},\D{3},,," which works on the sample record as with the 3 number characters , 3 letters,,,
In the answer I have used a more generalised pattern .Pattern = "^\w*,\w*,\w*,," This also works on the sample and mathces 3 commas each preceded with 0 or more alphanumeric characters followed directly by a fourth comma. Both patterns require a match to be from the begining of the string.
Pattern .Pattern = "^\d+,[a-zA-Z]+,\w*,," also works on the sample record. It would specify that before the first comma there should be 1 or greater numeric characters (and only numeric characters) and before the second comma ther should be 1 or more letters (and only letters). Before the 3rd comma there could be 0 or more alphanumeric characters.
The left function removes the rightmost character in the match ie. the last comma to generate the string used by the Regex.Replace.
Sub Test()
Dim str As String
str = "901,BLL,,,BQ,ARCTICA,,,,"
Debug.Print
Debug.Print str
str = strConv(str)
Debug.Print str
End Sub
Function strConv(ByVal str As String) As String
Dim objRegEx As Object
Dim oMatches As Object
Dim oMatch As Object
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
.MultiLine = False
.IgnoreCase = False
.Global = True
.Pattern = "^\w*,\w*,\w*,,"
End With
Set oMatches = objRegEx.Execute(str)
If oMatches.Count > 0 Then
For Each oMatch In oMatches
str = objRegEx.Replace(str, Left(oMatch.Value, oMatch.Length - 1))
Next oMatch
End If
strConv = str
End Function
Try this
Sub test()
Dim str As String
str = "901,BLL,,,BQ,ARCTICA,,,,"
str = strConv(str)
MsgBox str
End Sub
Function strConv(ByVal str As String) As String
Dim objRegEx As Object, allMatches As Object
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
.MultiLine = False
.IgnoreCase = False
.Global = True
.Pattern = ",,,"
End With
strConv = objRegEx.Replace(str, ",,")
End Function

get ASCII value of a Regex backreference in VBA

I have the following snippet in VBA
Dim RegEx As Object
Dim myResult As String
Set RegEx = CreateObject("vbscript.regexp")
With RegEx
.Global = True
.IgnoreCase = True
.MultiLine = True
.Pattern = "([^a-z|A-Z|0-9|\s])"
End With
myResult = "Hello, World!"
I want to replace each regex match with its ascii value -- in this case, replace anything that's not a letter or number with its ascii value, so the resulting string should be
"Hello44 World33"
I basically want something like this to use the Asc() function on a backreference:
myResult = RegEx.Replace(myResult, Asc("$1"))
except that's not valid. I've tried escaping in various ways but I think I am barking up the wrong tree.
Thanks!
Don't know if you can do it in one go with Replace(), but you can use Execute() and loop through the matches. Note your original pattern also matched |, which I don't think you wanted.
Sub Tester()
Dim RegEx As Object, matches As Object, match As Object
Dim myResult As String
Set RegEx = CreateObject("vbscript.regexp")
With RegEx
.Global = True
.IgnoreCase = True
.MultiLine = True
.Pattern = "([^a-z0-9\s])"
End With
myResult = "Hello, World!"
Set matches = RegEx.Execute(myResult)
For Each match In matches
Debug.Print "<" & match.Value & "> = " & Asc(match.Value)
myResult = Replace(myResult, match.Value, Asc(match.Value))
Next match
Debug.Print myResult
End Sub
One of the signatures of Regex.Replace takes an evaluator instead of a string for the replacement value. Take a look at this:
Replace Method (String, MatchEvaluator)
Let me know if you need further help.
Edit: Added the actual code.
Imports System
Imports System.Text.RegularExpressions
Module RegExSample
Function AscText(ByVal m As Match) As String
Return Asc(m.ToString())
End Function
Sub Tester()
Dim RegEx As Object, matches As Object, match As Object
Dim myResult As String
Set RegEx = CreateObject("vbscript.regexp")
With RegEx
.Global = True
.IgnoreCase = True
.MultiLine = True
.Pattern = "([^a-z0-9\s])"
End With
myResult = "Hello, World!"
myResult = RegEx.Replace(text, AddressOf RegExSample.AscText)
Debug.Print myResult
End Sub
End Module