Conditional Regular Expression in VBA - regex

I am parsing multiple HTML files using RegEx in Excel VBA (i know not the best thing to do) but I have this case which can either be - Scenario 1:
<span class="big vc vc_2 "><strong><i class="icon icon-angle-circled-down text-danger"></i>£51,038</strong> <span class="small">(-2.12%)</span></span>
or could be - Scenario 2:
<span class="big vc vc_2 "><strong><i class="icon icon-angle-circled-up text-success"></i>£292,539</strong> <span class="small">(14.13%)</span></span>
If the class ends in danger, I want to return -51038 and -2.12%
If the class ends in success, I want to return +292539 and 14.13%
The code I have been using for the second scenario and works fine is:
Sub Test()
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "<i class=""icon icon-angle-circled-up text-success""></i>([\s\S]*?)<"
sValue = HtmlSpecialCharsDecode(.Execute(sContent).Item(0).SubMatches(0))
End With
sValue = CleanString(sValue)
End sub
Function HtmlSpecialCharsDecode(sText)
With CreateObject("htmlfile")
.Open
With .createElement("textarea")
.innerHTML = sText
HtmlSpecialCharsDecode = .Value
End With
End With
End Function
Function CleanString(strIn As String) As String
Dim objRegex
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "[^\d]+"
CleanString = .Replace(strIn, vbNullString)
End With
End Function

All you need to do is add some more capturing groups with "or" conditions in them. In your case, you want the group (success|danger) (also (up|down) based on the examples). Then, instead of just checking the only submatch, check for the conditions that you put in your pattern:
Dim regex As Object
Dim matches As Object
Dim expr As String
expr = "<i class=""icon icon-angle-circled-(up|down) text-(success|danger)""></i>(.*?)</.*\((.*)%\)<.*"
Set regex = CreateObject("VBScript.RegExp")
With regex
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = expr
Set matches = .Execute(sContent)
End With
Dim isDanger As Boolean
If matches.Count > 0 Then
isDanger = (HtmlSpecialCharsDecode(matches.item(0).SubMatches(1)) = "danger")
sValue1 = HtmlSpecialCharsDecode(matches.item(0).SubMatches(2))
sValue2 = HtmlSpecialCharsDecode(matches.item(0).SubMatches(3))
End If
If isDanger Then
'Was "danger"
Debug.Print -CLng(CleanString(sValue1))
Debug.Print -CDbl(sValue2)
Else
'Was "success"
Debug.Print CLng(CleanString(sValue1))
Debug.Print CDbl(sValue2)
End If

Related

excel VB regexp 5.5 capturing group

I have a problem using regexp in excel macro, by calling regex.execute(string), instead of getting an array of returned capturing groups, I always get single return which is the whole string specified in the pattern.
By using the same pattern in http://www.regexr.com/, I can see the return nicely grouped. What am I missing from this:
Private Sub ParseFileName(strInput As String)
Dim regEx As New RegExp
Dim strPattern As String
Dim strReplace
'Sample string \\Work_DIR\FTP\Results\RevA\FTP_01_01_06_Results\4F\ACC2X2R33371_SASSSD_run1
strPattern = "FTP_(\w+)_Results\\(\w+)\\([\d,\D]+)_(SAS|SATA)(HDD|SSD)_run(\d)"
With regEx
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
Set strReplace = regEx.Execute(strInput)
ActiveCell.Offset(0, 1) = strReplace.Count
Else
ActiveCell.Offset(0, 1) = "(Not matched)"
End If
End sub
In the end, strReplace.Count always shows 1, which is the whole string FTP_01_01_06_Results\4F\ACC2X8R133371_SASSSD_run1
Use .SubMatches to get capturing groups values:
Private Sub ParseFileName(strInput As String)
Dim regEx As New RegExp
Dim strPattern As String
Dim strReplace As MatchCollection
Dim i As Long
'Sample string \\Work_DIR\FTP\Results\RevA\FTP_01_01_06_Results\4F\ACC2X2R33371_SASSSD_run1
strPattern = "FTP_(\w+)_Results\\(\w+)\\([\d,\D]+)_(SAS|SATA)(HDD|SSD)_run(\d)"
With regEx
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
Set strReplace = regEx.Execute(strInput)
ActiveCell.Offset(0, 1) = strReplace.Count
For i = 0 To 5
ActiveCell.Offset(i + 1, 1) = strReplace(0).SubMatches(i)
Next
Else
ActiveCell.Offset(0, 1) = "(Not matched)"
End If
End Sub

Using Regular expression in VBA

This is my sample record in a Text format with comma delimited
901,BLL,,,BQ,ARCTICA,,,,
i need to replace ,,, to ,,
The Regular expression that i tried
With regex
.MultiLine = False
.Global = True
.IgnoreCase = False
.Pattern="^(?=[A-Z]{3})\\,{3,}",",,"))$ -- error
Now i want to pass Line from file to Regex to correct the record, can some body guide me to fix this i am very new to VBA
I want to read the file line by line pass it to Regex
Looking at your original pattern I tried using .Pattern = "^\d{3},\D{3},,," which works on the sample record as with the 3 number characters , 3 letters,,,
In the answer I have used a more generalised pattern .Pattern = "^\w*,\w*,\w*,," This also works on the sample and mathces 3 commas each preceded with 0 or more alphanumeric characters followed directly by a fourth comma. Both patterns require a match to be from the begining of the string.
Pattern .Pattern = "^\d+,[a-zA-Z]+,\w*,," also works on the sample record. It would specify that before the first comma there should be 1 or greater numeric characters (and only numeric characters) and before the second comma ther should be 1 or more letters (and only letters). Before the 3rd comma there could be 0 or more alphanumeric characters.
The left function removes the rightmost character in the match ie. the last comma to generate the string used by the Regex.Replace.
Sub Test()
Dim str As String
str = "901,BLL,,,BQ,ARCTICA,,,,"
Debug.Print
Debug.Print str
str = strConv(str)
Debug.Print str
End Sub
Function strConv(ByVal str As String) As String
Dim objRegEx As Object
Dim oMatches As Object
Dim oMatch As Object
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
.MultiLine = False
.IgnoreCase = False
.Global = True
.Pattern = "^\w*,\w*,\w*,,"
End With
Set oMatches = objRegEx.Execute(str)
If oMatches.Count > 0 Then
For Each oMatch In oMatches
str = objRegEx.Replace(str, Left(oMatch.Value, oMatch.Length - 1))
Next oMatch
End If
strConv = str
End Function
Try this
Sub test()
Dim str As String
str = "901,BLL,,,BQ,ARCTICA,,,,"
str = strConv(str)
MsgBox str
End Sub
Function strConv(ByVal str As String) As String
Dim objRegEx As Object, allMatches As Object
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
.MultiLine = False
.IgnoreCase = False
.Global = True
.Pattern = ",,,"
End With
strConv = objRegEx.Replace(str, ",,")
End Function

vbscript: replace text in activedocument with hyperlink

Starting out at a new job and I have to go through a whole lot of documents that my predecessor left. They are MS Word-files that contain information on several hundreds of patents. Instead of copy/pasting every single patent-number in an online form, I would like to replace all patent-numbers with a clickable hyperlink. I guess this should be done with vbscript (I'm not used to working with MS Office).
I have so far:
<obsolete>
This is not working for me:
1. I (probably) need to add something to loop through the ActiveDocument
2. The replace-function probably needs a string and not an object for a parameter - is there a __toString() in vbscript?
THX!
UPDATE:
I have this partially working (regex and finding matches) - now if only I could get the anchor for the hyperlink.add-method right...
Sub HyperlinkPatentNumbers()
'
' HyperlinkPatentNumbers Macro
'
Dim objRegExp, Matches, match, myRange
Set myRange = ActiveDocument.Content
Set objRegExp = CreateObject("VBScript.RegExp")
With objRegExp
.Global = True
.IgnoreCase = False
.Pattern = "(WO|EP|US)([0-9]*)(A1|A2|B1|B2)"
End With
Set Matches = objRegExp.Execute(myRange)
If Matches.Count >= 1 Then
For Each match In Matches
ActiveDocument.Hyperlinks.Add Anchor:=objRegExp.match, Address:="http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&adjacent=true&locale=en_EP&CC=$1&NR=$2&KC=$3"
Next
End If
Set Matches = Nothing
Set objRegExp = Nothing
End Sub
Is this VBA or VBScript? In VBScript you cannot declare types like Dim newText As hyperLink, but every variable is a variant, so: Dim newText and nothing more.
objRegEx.Replace returns the string with replacements and needs two parameters passed into it: The original string and the text you want to replace the pattern with:
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.IgnoreCase = False
objRegEx.Pattern = "^(WO|EP|US)([0-9]*)(A1|A2|B1|B2)$"
' assuming plainText contains the text you want to create the hyperlink for
strName = objRegEx.Replace(plainText, "$1$2$3")
strAddress = objRegex.Replace(plainText, "http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&adjacent=true&locale=en_EP&CC=$1&NR=$2&KC=$3"
Now you can use strName and strAddress to create the hyperlink with.
Pro-tip: You can use objRegEx.Test(plainText) to see if the regexp matches anything for early handling of errors.
Problem solved:
Sub addHyperlinkToNumbers()
Dim objRegExp As Object
Dim matchRange As Range
Dim Matches
Dim match
Set objRegExp = CreateObject("VBScript.RegExp")
With objRegExp
.Global = True
.IgnoreCase = False
.Pattern = "(WO|EP|US|FR|DE|GB|NL)([0-9]+)(A1|A2|A3|A4|B1|B2|B3|B4)"
End With
Set Matches = objRegExp.Execute(ActiveDocument.Content)
For Each match In Matches
'This doesn't work, because of the WYSIWYG-model of MS Word:
'Set matchRange = ActiveDocument.Range(match.FirstIndex, match.FirstIndex + Len(match.Value))
Set matchRange = ActiveDocument.Content
With matchRange.Find
.Text = match.Value
.MatchWholeWord = True
.MatchCase = True
.Wrap = wdFindStop
.Execute
End With
ActiveDocument.Hyperlinks.Add Anchor:=matchRange, _
Address:="http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&adjacent=true&locale=en_EP&CC=" _
& match.Submatches(0) & "&NR=" & match.Submatches(1) & "&KC=" & match.Submatches(2)
Next
MsgBox "Hyperlink added to " & Matches.Count & " patent numbers"
Set objRegExp = Nothing
Set matchRange = Nothing
Set Matches = Nothing
Set match = Nothing
End Sub

Excel 2010 VBA "Invalid Procedure Call or Argument" error in regex function

I'm working with the following RegEx function in Excel 2010 and am getting the "Invalid Procedure Call or Argument" error on the last line of the function. I substituted the ActiveCell.Value for the constant (commented out). The constant did work properly, although the cell value does not.
What is causing this error to occur?
I appreciate any help in this. Thanks.
Sub SUB1()
Dim c As Variant
For Each c In Worksheets("Sheet1").Range("A1:D10").Cells
'MsgBox (c)
If RE6(c.Value) Then
c.Interior.ColorIndex = 7
Else
c.Interior.ColorIndex = 6
End If
Next
End Sub
Sub Test()
'Const strTest As String = "qwerty123456uiops"
Dim strTest As String
strTest = ActiveCell.Value
MsgBox RE6(strTest)
End Sub
Function RE6(strData As String) As String
Dim RE As Object
Dim REMatches As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.MultiLine = False
.Global = False
.IgnoreCase = True
.Pattern = "[0-9][0-9][0-9][0-9][0-9][0-9]"
End With
Set REMatches = RE.Execute(strData)
MsgBox ("REMatches.Count" & REMatches.Count)
'If Not REMatches Is Nothing Then
If REMatches.Count <= 0 Then
RE6 = ""
Else
RE6 = REMatches(0)
End If
'Else
'End If
End Function
Most likely there is no match: if you test the .Count property of REMatches is it zero?
Your function should test for that and return a suitable value (empty string maybe) instead.
EDIT: if you only want to check for the presence or absence of a pattern, then using .Test() is easier than using .Execute(). I changed your function to return a Boolean, which is more natural in this type of case.
Sub CheckCellValues()
Dim c As Range
For Each c In Worksheets("Sheet1").Range("A1:D10").Cells
If RE6(c.Value) Then
c.Interior.ColorIndex = 7
Else
c.Interior.ColorIndex = 6
End If
Next
End Sub
Function RE6(strData As String) As Boolean
Dim RE As Object
Dim REMatches As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.MultiLine = False
.Global = False
.IgnoreCase = True
.Pattern = "[0-9][0-9][0-9][0-9][0-9][0-9]"
End With
RE6 = RE.Test(strData) 'much simpler...
'or...
'REMatches = RE.Execute(strData)
'RE6 = (REMatches.Count > 0)
End Function
Your code appears to be aimed at testing whether a consecutive 6 digit number occurs in each cell in Sheet1 A1:D10, ie you are looking for a Boolean True/False so
Use a simpler pattern Re.Pattern = "[0-9]{6}"
Use the Regexp Test method - you don't need a collection of matches, just to know if one (as Re.Global = False) exists
Return a Boolean result from your function
Function RE6(strData As String) As Boolean
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.MultiLine = False
.Global = False
.IgnoreCase = True
.Pattern = "[0-9]{6}"
RE6 = .Test(strData)
End With
End Function

vbscript regex get img src URL

What I am trying to do, is get the IMG SRC URL from stringXML below. (i.e. http://www.webserver.com/picture.jpg)
This is what I have, but it is only giving me true/false:
<%
stringXML="<img src="http://www.webserver.com/picture.jpg"/><br>Some text here, blah blah blah."
Dim objRegex
Set objRegex = New Regexp
With objRegEx
.IgnoreCase = True
.Global = True
.Multiline = True
End with
strRegexPattern = "\<img\s[^\>]*?src=[""'][^\>]*?(jpg|bmp|gif)[""']"
objRegEx.Pattern = strRegexPattern
response.write objRegEx.Test(stringXML)
If objRegEx.Test(stringXML) = True Then
'The string has a tags.
'Match all A Tags
Set objRegExMatch = objRegEx.Execute(stringXML)
If objRegExMatch.Count > 0 Then
Redim arrAnchor(objRegExMatch.Count - 1)
For Each objRegExMatchItem In objRegExMatch
response.write objRegExMatchItem.Value
Next
End If
End If
%>
I basically want to ONLY get the IMG SRC value..
Any ideas why this line isn't working 'response.write objRegExMatchItem.Value'?
Cheers,
Drew
Try:
Function getImgTagURL(HTMLstring)
Set RegEx = New RegExp
With RegEx
.Pattern = "src=[\""\']([^\""\']+)"
.IgnoreCase = True
.Global = True
End With
Set Matches = RegEx.Execute(HTMLstring)
'Iterate through the Matches collection.
URL = ""
For Each Match in Matches
'We only want the first match.
URL = Match.Value
Exit For
Next
'Clean up
Set Match = Nothing
Set RegEx = Nothing
' src=" is hanging on the front, so we will replace it with nothing
getImgTagURL = Replace(URL, "src=""", "")
End Function