I am very new to RegEx and I can't seem to find what I looking for. I have a string such as:
[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]
and I want to get everything within the first set of brackets as well as the second set of brackets. If there is a way that I can do this with one pattern so that I can just loop through the matches, that would be great. If not, thats fine. I just need to be able to get the different sections of text separately. So far, the following is all I have come up with, but it just returns the whole string minus the first opening bracket and the last closing bracket:
[\[-\]]
(Note: I'm using the replace function, so this might be the reverse of what you are expecting.)
In my research, I have discovered that there are different RegEx engines. I'm not sure the name of the one that I'm using, but I'm using it in MS Access.
If you're using Access, you can use the VBScript Regular Expressions Library to do this. For example:
Const SOME_TEXT = "[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]"
Dim re
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.Pattern = "\[([^\]]+)\]"
Dim m As Object
For Each m In re.Execute(SOME_TEXT)
Debug.Print m.Submatches(0)
Next
Output:
cmdSubmitToDatacenter_Click
Form_frm_bk_UnsubmittedWires
Here is what I ended up using as it made it easier to get the individual values returned. I set a reference to the Microsoft VBScript Regular Expression 5.5 so that I could get Intellisense help.
Public Sub GetText(strInput As String)
Dim regex As RegExp
Dim colMatches As MatchCollection
Dim strModule As String
Dim strProcedure As String
Set regex = New RegExp
With regex
.Global = True
.Pattern = "\[([^\]]+)\]"
End With
Set colMatches = regex.Execute(strInput)
With colMatches
strProcedure = .Item(0).submatches.Item(0)
strModule = .Item(1).submatches.Item(0)
End With
Debug.Print "Module: " & strModule
Debug.Print "Procedure: " & strProcedure
Set regex = Nothing
End Sub
Related
I'm new to VBA and would like to seek some help with regards to using RegEx and I hope somehow can enlighten me on what I'm doing wrong. I'm currently trying to split a date into its individual date, month and year, and possible delimiters include "," , "-" and "/".
Function formattedDate(inputDate As String) As String
Dim dateString As String
Dim dateStringArray() As String
Dim day As Integer
Dim month As String
Dim year As Integer
Dim assembledDate As String
Dim monthNum As Integer
Dim tempArray() As String
Dim pattern As String()
Dim RegEx As Object
dateString = inputDate
Set RegEx = CreateObject("VBScript.RegExp")
pattern = "(/)|(,)|(-)"
dateStringArray() = RegEx.Split(dateString, pattern)
' .... code continues
This is what I am currently doing. However, there seems to be something wrong during the RegEx.Split function, as it seems to cause my codes to hang and not process further.
To just confirm, I did something simple:
MsgBox("Hi")
pattern = "(/)|(,)|(-)"
dateStringArray() = RegEx.Split(dateString, pattern)
MsgBox("Bye")
"Hi" msgbox pops out, but the "Bye" msgbox never gets popped out, and the codes further down don't seem to get excuted at all, which led to my suspicion that the RegEx.Split is causing it to be stuck.
Can I check if I'm actually using RegEx.Split the right way? According to MSDN here, Split(String, String) returns an array of strings as well.
Thank you!
Edit: I'm trying not to explore the CDate() function as I am trying not to depend on the locale settings of the user's computer.
To split a string with a regular expression in VBA:
Public Function SplitRe(Text As String, Pattern As String, Optional IgnoreCase As Boolean) As String()
Static re As Object
If re Is Nothing Then
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.MultiLine = True
End If
re.IgnoreCase = IgnoreCase
re.Pattern = Pattern
SplitRe = Strings.Split(re.Replace(text, ChrW(-1)), ChrW(-1))
End Function
Usage example:
Dim v
v = SplitRe("a,b/c;d", "[,;/]")
Splitting by a regex is definitely nontrivial to implement compared to other regex operations, so I don't blame you for being stumped!
If you wanted to implement it yourself, it helps to know that RegExp objects from Microsoft VBScript Regular Expressions 5.5 have a FirstIndex property and a Length property, such that you can loop through the matches and pick out all the substrings between the end of one match (or the start of the string) and the start of the next match (or the end of the string).
If you don't want to implement it yourself, I've also implemented a RegexSplit UDF using those same RegExp objects on my GitHub.
Quoting an example from the documentation of VbScript Regexp:
https://msdn.microsoft.com/en-us/library/y27d2s18%28v=vs.84%29.aspx
Function SubMatchTest(inpStr)
Dim retStr
Dim oRe, oMatch, oMatches
Set oRe = New RegExp
' Look for an e-mail address (not a perfect RegExp)
oRe.Pattern = "(\w+)#(\w+)\.(\w+)"
' Get the Matches collection
Set oMatches = oRe.Execute(inpStr)
' Get the first item in the Matches collection
Set oMatch = oMatches(0)
' Create the results string.
' The Match object is the entire match - dragon#xyzzy.com
retStr = "Email address is: " & oMatch & vbNewLine
' Get the sub-matched parts of the address.
retStr = retStr & "Email alias is: " & oMatch.SubMatches(0) ' dragon
retStr = retStr & vbNewLine
retStr = retStr & "Organization is: " & oMatch.SubMatches(1) ' xyzzy
SubMatchTest = retStr
End Function
To test, call:
MsgBox(SubMatchTest("Please send mail to dragon#xyzzy.com. Thanks!"))
In short, you need your Pattern to match the various parts you want to extract, with the spearators in between, maybe something like:
"(\d+)[/-,](\d+)[/-,](\d+)"
The whole thing will be in oMatch, while the numbers (\d) will end up in oMatch.SubMatches(0) to oMatch.SubMatches(2).
When using this line in Excel VBA:
Cells(a, 20).Value = regexProjet.Execute(Cells(a, 1).Value)(0)
I get a warning saying that either the parameter or the command is invalid.
I have use this line in many places in my code and it worked fine, the format of the cells are all standard (they're strings...).
Anyone could give me some hints of what to look for?
FYI this is how I declared the regex:
Dim regexProjet As Object
Set regexProjet = CreateObject("VBScript.RegExp")
regexProjet.IgnoreCase = True
regexProjet.Pattern = "^ ([a-z]+)(-)([0-9]+)" 'conserve seulement la clé du projet
You will get that response if your regex does not match your data. To avoid it, using the technique you are using, first do a test to see if your regex matches your string.
e.g:
If regexProjet.test(Cells(a,1).value) then
Cells(a, 20).Value = regexProjet.Execute(Cells(a, 1).Value)(0)
Else
... your error routine
End If
Also, you should note that if you are just trying to match the overall pattern, there is no need for the capturing groups (and they will add execution time, making the regex less efficient).
Here is sample code I got to run in Excel 2013 but it's slightly different than what you have.
I got the code from [How to regular expressions]: http://support.microsoft.com/kb/818802
Sub testreg()
Dim regexProjet As New RegExp
Dim objMatch As Match
Dim colMatches As MatchCollection
Dim RetStr As String
RetStr = ""
regexProjet.IgnoreCase = True
regexProjet.Pattern = "^ ([a-z]+)(-)([0-9]+)"
Set colMatches = regexProjet.Execute(Cells(1, 1).Value)
For Each objMatch In colMatches ' Iterate Matches collection.
RetStr = RetStr & " " & objMatch.Value
Next
Cells(1, 20).Value = RetStr
End Sub
I have a column of lists of codes like the following.
2.A.B, 1.C.D, A.21.C.D, 1.C.D.11.C.D
6.A.A.5.F.A, 2.B.C.H.1
8.ABC.B, A.B.C.D
12.E.A, 3.NO.T
A.3.B.C.x, 1.N.N.9.J.K
I want to find all instances of two single upper-case letters separated by a period, but only those that follow a number less than 6. I want to remove the period between the letters and convert the second letter to lower case. Desired output:
2.Ab, 1.Cd, A.21.C.D, 1.Cd.11.C.D
6.A.A.5.Fa, 2.Bc.H.1
8.ABC.B, A.B.C.D
12.E.A, 3.NO.T
A.3.Bc.x, 1.Nn.9.J.K
I have the following code in VBA.
Sub fixBlah()
Dim re As VBScript_RegExp_55.RegExp
Set re = New VBScript_RegExp_55.RegExp
re.Global = True
re.Pattern = "\b([1-5]\.[A-Z])\.([A-Z])\b"
For Each c In Selection.Cells
c.Value = re.Replace("$1$2")
Next c
End Sub
This removes the period, but doesn't handle the lower-case requirement. I know in other flavors of regular expressions, I can use something like
re.Replace("$1\L$2\E")
but this does not have the desired effect in VBA. I tried googling for this functionality, but I wasn't able to find anything. Is there a way to do this with a simple re.Replace() statement in VBA?
If not, how would I go about achieving this otherwise? The pattern matching is complex enough that I don't even want to think about doing this without regular expressions.
[I have a solution I worked up, posted below, but I'm hoping someone can come up with something simpler.]
Here is a workaround that uses the properties of each individual regex match to make the VBA Replace() function replace only the text from the match and nothing else.
Sub fixBlah2()
Dim re As VBScript_RegExp_55.RegExp, Matches As VBScript_RegExp_55.MatchCollection
Dim M As VBScript_RegExp_55.Match
Dim tmpChr As String, pre As String, i As Integer
Set re = New VBScript_RegExp_55.RegExp
re.Global = True
re.Pattern = "\b([1-5]\.[A-Z])\.([A-Z])\b"
For Each c In Selection.Cells
'Count of number of replacements made. This is used to adjust M.FirstIndex
' so that it still matches correct substring even after substitutions.
i = 0
Set Matches = re.Execute(c.Value)
For Each M In Matches
tmpChr = LCase(M.SubMatches.Item(1))
If M.FirstIndex > 0 Then
pre = Left(c.Value, M.FirstIndex - i)
Else
pre = ""
End If
c.Value = pre & Replace(c.Value, M.Value, M.SubMatches.Item(0) & tmpChr, _
M.FirstIndex + 1 - i, 1)
i = i + 1
Next M
Next c
End Sub
For reasons I don't quite understand, if you specify a start index in Replace(), the output starts at that index as well, so the pre variable is used to capture the first part of the string that gets clipped off by the Replace function.
So this question is old, but I do have another workaround. I use a double regex so to speak, where the first engine looks for the match as an execute, then I loop through each of those items and replace with a lowercase version. For example:
Sub fixBlah()
Dim re As VBScript_RegExp_55.RegExp
dim ToReplace as Object
Set re = New VBScript_RegExp_55.RegExp
for each c in Selection.Cells
with re `enter code here`
.Global = True
.Pattern = "\b([1-5]\.[A-Z])\.([A-Z])\b"
Set ToReplace = .execute(C.Value)
end with
'This generates a list of items that match. Now to lowercase them and replace
Dim LcaseVersion as string
Dim ItemCt as integer
for itemct = 0 to ToReplace.count - 1
LcaseVersion = lcase(ToReplace.item(itemct))
with re `enter code here`
.Global = True
.Pattern = ToReplace.item(itemct) 'This looks for that specific item and replaces it with the lowercase version
c.value = .replace(C.Value, LCaseVersion)
end with
End Sub
I hope this helps!
I am doing this task as part of a larger sub in order to massively reduce the workload for a different team.
I am trying to read in a string and use Regular Expressions to replace one-to-many spaces with a single space (or another character). At the moment I am using a local string, however in the main sub this data will come from an external .txt file. The number of spaces between elements in this .txt can vary depeneding on the row.
I am using the below code, and replacing the spaces with a dash. I have tried different variations and different logic on the below code, but always get "Run-time error '91': Object Variable or with clock variable not set" on line "c = re.Replace(s, replacement)"
After using breakpoints, I have found out that my RegularExpression (re) is empty, but I can't quite figure out how to progress from here. How do I replace my spaces with dashes? I have been at this problem for hours and spent most of that time on Google to see if someone has had a similar issue.
Sub testWC()
Dim s As String
Dim c As String
Dim re As RegExp
s = "hello World"
Dim pattern As String
pattern = "\s+"
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Debug.Print (c)
End Sub
Extra information: Using Excel 2010. Have successfully linked all my references (Microsoft VBScript Regular Expressions 5.5". I was sucessfully able to replace the spaces using the vanilla "Replace" function, however as the number of spaces between elements vary I am unable to use that to solve my issue.
Ed: My .txt file is not fixed either, there are a number of rows that are different lengths so I am unable to use the MID function in excel to dissect the string either
Please help
Thanks,
J.H.
You're not setting up the RegExp object correctly.
Dim pattern As String
pattern = "\s+" ' pattern is just a local string, not bound to the RegExp object!
You need to do this:
Dim re As RegExp
Set re = New RegExp
re.Pattern = "\s+" ' Now the pattern is bound to the RegExp object
re.Global = True ' Assuming you want to replace *all* matches
s = "hello World"
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Try setting the pattern inside your Regex object. Right now, re is just a regex with no real pattern assigned to it. Try adding in re.Pattern = pattern after you initialize your pattern string.
You initialized the pattern but didn't actually hook it into the Regex. When you ended up calling replace it didn't know what it was looking for pattern wise, and threw the error.
Try also setting the re as a New RegExp.
Sub testWC()
Dim s As String
Dim c As String
Dim re As RegExp
Set re = New RegExp
s = "hello World"
Dim pattern As String
pattern = "\s+"
re.Pattern = pattern
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Debug.Print (c)
End Sub
With VBA, I'm trying to use regex to capture the filename from a UNC path without the extension--looking at .TIF files only.
So far this is what I have:
Function findTIFname(filestr As String) As String
Dim re As RegExp
Dim output As String
Dim matches As MatchCollection
Set re = New RegExp
re.pattern = "[^\\]+(?:[.]tif)$"
Set matches = re.Execute(filestr)
If matches.Count > 0 Then
output = matches(0).Value
Else
output = ""
End If
findTIFname = output
End Function
But when I run the function as follows:
msgbox findTIFname("\\abc\def\ghi\jkl\41e07.tif")
I get the following output:
41e07.tif
I thought that "(?:xxx)" was the regex syntax for a non-capturing group; what am I doing wrong?
The syntax (?:...) is a non-capturing group. What you need here is a positive lookahead assertion which has a (?=...) syntax like so:
re.pattern = "[^\\]+(?=[.]tif$)"
Note that lookaround assertions have zero width and consume no characters.
Do you really need to do this with RegEx?
Access (or better, MS Office) has built-in ways to do this quite easily without RegEx.
You just need to reference the Microsoft Scripting Runtime (which should be included in every MS Office installation, as far as I know).
Then you can use the FileSystemObject:
Public Function findTIFname(filestr As String) As String
Dim fso As FileSystemObject
Set fso = New FileSystemObject
If fso.GetExtensionName(filestr) = "tif" Then
findTIFname = fso.GetBaseName(filestr)
End If
Set fso = Nothing
End Function
Given your example UNC path \\abc\def\ghi\jkl\41e07.tif, this will return 41e07.