How to use non-capturing group in VBA regex? - regex

With VBA, I'm trying to use regex to capture the filename from a UNC path without the extension--looking at .TIF files only.
So far this is what I have:
Function findTIFname(filestr As String) As String
Dim re As RegExp
Dim output As String
Dim matches As MatchCollection
Set re = New RegExp
re.pattern = "[^\\]+(?:[.]tif)$"
Set matches = re.Execute(filestr)
If matches.Count > 0 Then
output = matches(0).Value
Else
output = ""
End If
findTIFname = output
End Function
But when I run the function as follows:
msgbox findTIFname("\\abc\def\ghi\jkl\41e07.tif")
I get the following output:
41e07.tif
I thought that "(?:xxx)" was the regex syntax for a non-capturing group; what am I doing wrong?

The syntax (?:...) is a non-capturing group. What you need here is a positive lookahead assertion which has a (?=...) syntax like so:
re.pattern = "[^\\]+(?=[.]tif$)"
Note that lookaround assertions have zero width and consume no characters.

Do you really need to do this with RegEx?
Access (or better, MS Office) has built-in ways to do this quite easily without RegEx.
You just need to reference the Microsoft Scripting Runtime (which should be included in every MS Office installation, as far as I know).
Then you can use the FileSystemObject:
Public Function findTIFname(filestr As String) As String
Dim fso As FileSystemObject
Set fso = New FileSystemObject
If fso.GetExtensionName(filestr) = "tif" Then
findTIFname = fso.GetBaseName(filestr)
End If
Set fso = Nothing
End Function
Given your example UNC path \\abc\def\ghi\jkl\41e07.tif, this will return 41e07.

Related

Extract number and a character from sting using regex

I am trying to extract the number along with 'x'from string:
1. "KAWAN (FRZ) LACHA FLACKEY PARATHA 8X25X80 GM" or
2. G.G. HOT SEV 20X285GM" using function: but it returns only last number with "x". Expected output is 2X25X or 20X... also is it possible to store the string without the extracted value using the same function?:
Public Function getNumber(strInput As String) As Variant
Dim regex As New RegExp
Dim matches As Object
regex.Pattern = "(\d??[x|X])"
regex.Global = False
Set matches = regex.Execute(strInput)
If matches.Count = 0 Then
getNumber = CVErr(xlErrNA)
Else
getNumber = matches(0).Value
End If
End Function
Try the following pattern for your regular expression...
regex.Pattern = "((\d{1,2}[xX])+)"
Results
Demo
By the way, since you're using early binding, you can declare matches as MatchCollection instead of Object.
Dim matches As MatchCollection
Tru changing your regular expression to (\d*)[xX], this will capture none or more numbers followed by an X and put the numbers in a group. You can test this regex in this website, you'll see that applying this regex in your first exemple KAWAN (FRZ) LACHA FLACKEY PARATHA 8X25X80 GM it will capture 8X and 25X and each match will have the 8 and 25 as its group, respectively

Access vba Replace/Regex?

Afternoon,
I'm having trouble with some data imports from PowerPoint into Access.
Initially when I import the data the notes section comes in as the below for each row:
<div class="ExternalClass63DBAC931E7D4E4680E207BF938770AA"><p>xxxxxxxxxxx.</p> <p>xxxxxxxxxxxx</p></div>
The xxxxxxx is where the data I want to pull out is.
I have tried Regex in the form of replacing everything between the <> as seen below
Public Function AddPipesBeforeDates(ByVal strText As String) As String
Dim regex As Object
Dim matches As Object
Dim m As Object
Set regex = CreateObject("VBScript.RegExp")
regex.Global = True
regex.pattern = "<.*>"
Set matches = regex.Execute(strText)
For Each m In matches
strText = Replace(strText, m, "")
Next
AddPipesBeforeDates = strText
Set matches = Nothing
Set regex = Nothing
End Function
The problem becomes it wipes out everything.
I just found out about Regex and I'm not familiar with it.
Is there a way to delete the unwanted data?
Note the xxxxxx data can be any value spaces or special characters
Any thoughts or ideas on how to do this would be appreciated. I may be going at this the wrong way.
Thanks
You must note that . matches any character but a newline (thus, including < and >).
To remove all substrings between < and >, you may use
regex.pattern = "<[^<]+>"
This way, you will avoid "overfiring" and matching more than you need.

Find specific instance of a match in string using RegEx

I am very new to RegEx and I can't seem to find what I looking for. I have a string such as:
[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]
and I want to get everything within the first set of brackets as well as the second set of brackets. If there is a way that I can do this with one pattern so that I can just loop through the matches, that would be great. If not, thats fine. I just need to be able to get the different sections of text separately. So far, the following is all I have come up with, but it just returns the whole string minus the first opening bracket and the last closing bracket:
[\[-\]]
(Note: I'm using the replace function, so this might be the reverse of what you are expecting.)
In my research, I have discovered that there are different RegEx engines. I'm not sure the name of the one that I'm using, but I'm using it in MS Access.
If you're using Access, you can use the VBScript Regular Expressions Library to do this. For example:
Const SOME_TEXT = "[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]"
Dim re
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.Pattern = "\[([^\]]+)\]"
Dim m As Object
For Each m In re.Execute(SOME_TEXT)
Debug.Print m.Submatches(0)
Next
Output:
cmdSubmitToDatacenter_Click
Form_frm_bk_UnsubmittedWires
Here is what I ended up using as it made it easier to get the individual values returned. I set a reference to the Microsoft VBScript Regular Expression 5.5 so that I could get Intellisense help.
Public Sub GetText(strInput As String)
Dim regex As RegExp
Dim colMatches As MatchCollection
Dim strModule As String
Dim strProcedure As String
Set regex = New RegExp
With regex
.Global = True
.Pattern = "\[([^\]]+)\]"
End With
Set colMatches = regex.Execute(strInput)
With colMatches
strProcedure = .Item(0).submatches.Item(0)
strModule = .Item(1).submatches.Item(0)
End With
Debug.Print "Module: " & strModule
Debug.Print "Procedure: " & strProcedure
Set regex = Nothing
End Sub

How to test for specific characters with regex in VBA

I need to test for a string variable to ensure it matches a specific format:
XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
...where x can be any alphanumerical character (a - z, 0 - 9).
I've tried the following, but it doesn't seem to work (test values constantly fail)
If val Like "^([A-Za-z0-9_]{8})([-]{1})([A-Za-z0-9_]{4})([-]{1})([A-Za-z0-9_]{4})([-]{1})([A-Za-z0-9_]{4})([-]{1})([A-Za-z0-9_]{12})" Then
MsgBox "OK"
Else
MsgBox "FAIL"
End If
.
fnCheckSubscriptionID "fdda752d-32de-474e-959e-4b5bf7574436"
Any pointers? I don't mind if this can be achieved in vba or with a formula.
You are already using the ^ beginning-of-string anchor, which is terrific. You also need the $ end-of-string anchor, otherwise in the last group of digits, the regex engine is able to match the first 12 digits of a longer group of digits (e.g. 15 digits).
I rewrote your regex in a more compact way:
^[A-Z0-9]{8}-(?:[A-Z0-9]{4}-){3}[A-Z0-9]{12}$
Note these few tweaks:
[-]{1} can just be expressed with -
I removed the underscores as you say you only want letters and digits. If you do want underscores, instead of [A-Z0-9]{8} (for instance), you can just write \w{8} as \w matches letters, digits and underscores.
Removed the lowercase letters. If you do want to allow lowercase letters, we'll turn on case-insensitive mode in the code (see line 3 of the sample code below).
No need for (capturing groups), so removed the parentheses
We have three groups of four letters and a dash, so wrote (?:[A-Z0-9]{4}-) with a {3}
Sample code
Dim myRegExp, FoundMatch
Set myRegExp = New RegExp
myRegExp.IgnoreCase = True
myRegExp.Pattern = "^[A-Z0-9]{8}-(?:[A-Z0-9]{4}-){3}[A-Z0-9]{12}$"
FoundMatch = myRegExp.Test(SubjectString)
You can do this either with a regular expression, or with just native VBA. I am assuming from your code that the underscore character is also valid in the string.
To do this with native VBA, you need to build up the LIKE string since quantifiers are not included. Also using Option Compare Text makes the "like" action case insensitive.
Option Explicit
Option Compare Text
Function TestFormat(S As String) As Boolean
'Sections
Dim S1 As String, S2_4 As String, S5 As String
Dim sLike As String
With WorksheetFunction
S1 = .Rept("[A-Z0-9_]", 8)
S2_4 = .Rept("[A-Z0-9_]", 4)
S5 = .Rept("[A-Z0-9_]", 12)
sLike = S1 & .Rept("-" & S2_4, 3) & "-" & S5
End With
TestFormat = S Like sLike
End Function
With regular expressions, the pattern is simpler to build, but the execution time may be longer, and that may make a difference if you are processing very large amounts of data.
Function TestFormatRegex(S As String) As Boolean
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.MultiLine = True
.Pattern = "^\w{8}(?:-\w{4}){3}-\w{12}$"
TestFormatRegex = .test(S)
End With
End Function
Sub Test()
MsgBox fnCheckSubscriptionID("XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX")
End Sub
Function fnCheckSubscriptionID(strCont)
' Tools - References - add "Microsoft VBScript Regular Expressions 5.5"
With New RegExp
.Pattern = "^\w{8}-\w{4}-\w{4}-\w{4}-\w{12}$"
.Global = True
.MultiLine = True
fnCheckSubscriptionID = .Test(strCont)
End With
End Function
In case of any problems with early binding you can use late binding With CreateObject("VBScript.RegExp") instead of With New RegExp.

VBA Regular Expressions - Run-Time Error 91 when trying to replace characters in string

I am doing this task as part of a larger sub in order to massively reduce the workload for a different team.
I am trying to read in a string and use Regular Expressions to replace one-to-many spaces with a single space (or another character). At the moment I am using a local string, however in the main sub this data will come from an external .txt file. The number of spaces between elements in this .txt can vary depeneding on the row.
I am using the below code, and replacing the spaces with a dash. I have tried different variations and different logic on the below code, but always get "Run-time error '91': Object Variable or with clock variable not set" on line "c = re.Replace(s, replacement)"
After using breakpoints, I have found out that my RegularExpression (re) is empty, but I can't quite figure out how to progress from here. How do I replace my spaces with dashes? I have been at this problem for hours and spent most of that time on Google to see if someone has had a similar issue.
Sub testWC()
Dim s As String
Dim c As String
Dim re As RegExp
s = "hello World"
Dim pattern As String
pattern = "\s+"
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Debug.Print (c)
End Sub
Extra information: Using Excel 2010. Have successfully linked all my references (Microsoft VBScript Regular Expressions 5.5". I was sucessfully able to replace the spaces using the vanilla "Replace" function, however as the number of spaces between elements vary I am unable to use that to solve my issue.
Ed: My .txt file is not fixed either, there are a number of rows that are different lengths so I am unable to use the MID function in excel to dissect the string either
Please help
Thanks,
J.H.
You're not setting up the RegExp object correctly.
Dim pattern As String
pattern = "\s+" ' pattern is just a local string, not bound to the RegExp object!
You need to do this:
Dim re As RegExp
Set re = New RegExp
re.Pattern = "\s+" ' Now the pattern is bound to the RegExp object
re.Global = True ' Assuming you want to replace *all* matches
s = "hello World"
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Try setting the pattern inside your Regex object. Right now, re is just a regex with no real pattern assigned to it. Try adding in re.Pattern = pattern after you initialize your pattern string.
You initialized the pattern but didn't actually hook it into the Regex. When you ended up calling replace it didn't know what it was looking for pattern wise, and threw the error.
Try also setting the re as a New RegExp.
Sub testWC()
Dim s As String
Dim c As String
Dim re As RegExp
Set re = New RegExp
s = "hello World"
Dim pattern As String
pattern = "\s+"
re.Pattern = pattern
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Debug.Print (c)
End Sub