vbscript regex get img src URL - regex

What I am trying to do, is get the IMG SRC URL from stringXML below. (i.e. http://www.webserver.com/picture.jpg)
This is what I have, but it is only giving me true/false:
<%
stringXML="<img src="http://www.webserver.com/picture.jpg"/><br>Some text here, blah blah blah."
Dim objRegex
Set objRegex = New Regexp
With objRegEx
.IgnoreCase = True
.Global = True
.Multiline = True
End with
strRegexPattern = "\<img\s[^\>]*?src=[""'][^\>]*?(jpg|bmp|gif)[""']"
objRegEx.Pattern = strRegexPattern
response.write objRegEx.Test(stringXML)
If objRegEx.Test(stringXML) = True Then
'The string has a tags.
'Match all A Tags
Set objRegExMatch = objRegEx.Execute(stringXML)
If objRegExMatch.Count > 0 Then
Redim arrAnchor(objRegExMatch.Count - 1)
For Each objRegExMatchItem In objRegExMatch
response.write objRegExMatchItem.Value
Next
End If
End If
%>
I basically want to ONLY get the IMG SRC value..
Any ideas why this line isn't working 'response.write objRegExMatchItem.Value'?
Cheers,
Drew

Try:
Function getImgTagURL(HTMLstring)
Set RegEx = New RegExp
With RegEx
.Pattern = "src=[\""\']([^\""\']+)"
.IgnoreCase = True
.Global = True
End With
Set Matches = RegEx.Execute(HTMLstring)
'Iterate through the Matches collection.
URL = ""
For Each Match in Matches
'We only want the first match.
URL = Match.Value
Exit For
Next
'Clean up
Set Match = Nothing
Set RegEx = Nothing
' src=" is hanging on the front, so we will replace it with nothing
getImgTagURL = Replace(URL, "src=""", "")
End Function

Related

Unable to scoop out specific portions from a webpage using regex

The following script written in vba can parse the names out of some json content from a webpage using xhr. I know there is a vba json converter out there as well to parse information from json content. If I could know the method to apply regex in such cases, I could have created the pattern to do the trick.
Current attempt (working one):
Sub GetNames()
Dim str As Variant, N&, R&, rxp As New RegExp
With New XMLHTTP60
.Open "GET", "https://oresapp.asicanada.net/ores.imis.services/api/member/?address=&callback=angular.callbacks._0&city=&companyName=&personName=", False
.send
str = Split(.responseText, ":[{""Id"":")
End With
N = UBound(str)
For R = 1 To N
Cells(R, 1) = Split(Split(str(R), "FullName"":""")(1), """")(0)
Next R
End Sub
Ain't it possible to parse names from the above link using regex?
Yes. You can use a lazy regex as follows
Option Explicit
Public Sub GetFullNames()
Dim results(), matches As Object, s As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://oresapp.asicanada.net/ores.imis.services/api/member/?address=&callback=angular.callbacks._0&city=&companyName=&personName=", False
.send
s = .responsetext
End With
With CreateObject("VBScript.RegExp")
.Global = True
.IgnoreCase = False
.Pattern = "FullName"":""(.*?)"""
.MultiLine = True
Set matches = .Execute(s)
ReDim results(1 To matches.Count)
End With
Dim match As Variant, r As Long
For Each match In matches
r = r + 1
results(r) = match.submatches(0)
Next
With ThisWorkbook.Worksheets("Sheet1")
.Cells(1, 1).Resize(UBound(results), 1) = Application.Transpose(results)
End With
End Sub
Lazy quantifier:
The lazy .*? guarantees that the quantified dot only matches as many
characters as needed for the rest of the pattern to succeed.
Therefore, the pattern only matches one {START}…{END} item at a time,
which is what we want.
No array:
Option Explicit
Public Sub GetFullNames()
Dim matches As Object, s As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://oresapp.asicanada.net/ores.imis.services/api/member/?address=&callback=angular.callbacks._0&city=&companyName=&personName=", False
.send
s = .responsetext
End With
With CreateObject("VBScript.RegExp")
.Global = True
.IgnoreCase = False
.Pattern = "FullName"":""(.*?)"""
.MultiLine = True
Set matches = .Execute(s)
End With
Dim match As Variant, r As Long
For Each match In matches
r = r + 1
With ThisWorkbook.Worksheets("Sheet1")
.Cells(r, 1) = match.submatches(0)
End With
Next
End Sub

Regex not Triggering VBA

This is my regex:
Dim vbRegX As Object, vbRegXMatch As Object
Set vbRegX = CreateObject("vbscript.regexp")
With vbRegX
.Global = True
.IgnoreCase = True
.Pattern = "^[a-zA-Z0-9_-]{1,20}$"
End With
code that uses it:
Set vbRegXMatch = vbRegX.Execute(Me.txtProduct.Text)
If vbRegXMatch.Count = 1 Then
MsgBox "This string has invalid characters in it. Illegal characters are out side of the following ranges:" & vbNewLine & vbNewling & "a-z or A-Z" & vbNewLine & vbNewling & "0-9, - or _. Please try again."
Cancel = True
Me.txtProduct.SetFocus
Set vbRegXMatch = Nothing
Set vbRegX = Nothing
Exit Sub
End If
This code fires with invalid characters but not when length is > 20. This is the output given to me by Regex Buddy:
Dim FoundMatch As Boolean
Dim myRegExp As RegExp
Set myRegExp = New RegExp
myRegExp.Pattern = "^[a-zA-Z0-9_-]{1,20}$"
FoundMatch = myRegExp.Test(SubjectString)
Can anyone so kindly point out what Im missing?
visual of the control:
Your regex matches valid input. Thus, you need to .Test(your_string) and if the result is False, you need to fire an error.
Replace
Set vbRegXMatch = vbRegX.Execute(Me.txtProduct.Text)
If vbRegXMatch.Count = 1 Then
with
If vbRegX.Test("1234555") = False Then
Also, since you expect a single match, use
.Global = False

Conditional Regular Expression in VBA

I am parsing multiple HTML files using RegEx in Excel VBA (i know not the best thing to do) but I have this case which can either be - Scenario 1:
<span class="big vc vc_2 "><strong><i class="icon icon-angle-circled-down text-danger"></i>£51,038</strong> <span class="small">(-2.12%)</span></span>
or could be - Scenario 2:
<span class="big vc vc_2 "><strong><i class="icon icon-angle-circled-up text-success"></i>£292,539</strong> <span class="small">(14.13%)</span></span>
If the class ends in danger, I want to return -51038 and -2.12%
If the class ends in success, I want to return +292539 and 14.13%
The code I have been using for the second scenario and works fine is:
Sub Test()
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "<i class=""icon icon-angle-circled-up text-success""></i>([\s\S]*?)<"
sValue = HtmlSpecialCharsDecode(.Execute(sContent).Item(0).SubMatches(0))
End With
sValue = CleanString(sValue)
End sub
Function HtmlSpecialCharsDecode(sText)
With CreateObject("htmlfile")
.Open
With .createElement("textarea")
.innerHTML = sText
HtmlSpecialCharsDecode = .Value
End With
End With
End Function
Function CleanString(strIn As String) As String
Dim objRegex
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "[^\d]+"
CleanString = .Replace(strIn, vbNullString)
End With
End Function
All you need to do is add some more capturing groups with "or" conditions in them. In your case, you want the group (success|danger) (also (up|down) based on the examples). Then, instead of just checking the only submatch, check for the conditions that you put in your pattern:
Dim regex As Object
Dim matches As Object
Dim expr As String
expr = "<i class=""icon icon-angle-circled-(up|down) text-(success|danger)""></i>(.*?)</.*\((.*)%\)<.*"
Set regex = CreateObject("VBScript.RegExp")
With regex
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = expr
Set matches = .Execute(sContent)
End With
Dim isDanger As Boolean
If matches.Count > 0 Then
isDanger = (HtmlSpecialCharsDecode(matches.item(0).SubMatches(1)) = "danger")
sValue1 = HtmlSpecialCharsDecode(matches.item(0).SubMatches(2))
sValue2 = HtmlSpecialCharsDecode(matches.item(0).SubMatches(3))
End If
If isDanger Then
'Was "danger"
Debug.Print -CLng(CleanString(sValue1))
Debug.Print -CDbl(sValue2)
Else
'Was "success"
Debug.Print CLng(CleanString(sValue1))
Debug.Print CDbl(sValue2)
End If

Regular expression matching only the first match

I'm trying to use regular expressions to find repeating patterns in a string. I tested my RegExp in a tester and I think it's no problem with my Regex, but my code return only the first match(0.0000000000000000) and not the other matches:
Here is my code:
Dim searchstr As String
Dim regexp As Object
Dim colregmatch As MatchCollection
searchstr = "ST/X 0.0000000000000000 6.4000000000000004 12.8000000000000010 19.1999999999999990 25.6000000000000010 32.0000000000000000"
Set regexp = CreateObject("vbscript.regexp")
With regexp
.Pattern = "([0-9]+\.[0-9]+)\s*"
.IgnoreCase = True
.Global = True
.MultiLine = True
.Global = False
End With
Set colregmatch = regexp.Execute(searchstr)
If colregmatch.Count <> 0 Then
For Each Match In colregmatch
MsgBox Match
Next Match
End If
Would you please help me to solve this problem?
Thanks a lot
You have set Global flag to first true, then to false.
.Global = True
.MultiLine = True
.Global = False
Try to remove the last one.

vbscript: replace text in activedocument with hyperlink

Starting out at a new job and I have to go through a whole lot of documents that my predecessor left. They are MS Word-files that contain information on several hundreds of patents. Instead of copy/pasting every single patent-number in an online form, I would like to replace all patent-numbers with a clickable hyperlink. I guess this should be done with vbscript (I'm not used to working with MS Office).
I have so far:
<obsolete>
This is not working for me:
1. I (probably) need to add something to loop through the ActiveDocument
2. The replace-function probably needs a string and not an object for a parameter - is there a __toString() in vbscript?
THX!
UPDATE:
I have this partially working (regex and finding matches) - now if only I could get the anchor for the hyperlink.add-method right...
Sub HyperlinkPatentNumbers()
'
' HyperlinkPatentNumbers Macro
'
Dim objRegExp, Matches, match, myRange
Set myRange = ActiveDocument.Content
Set objRegExp = CreateObject("VBScript.RegExp")
With objRegExp
.Global = True
.IgnoreCase = False
.Pattern = "(WO|EP|US)([0-9]*)(A1|A2|B1|B2)"
End With
Set Matches = objRegExp.Execute(myRange)
If Matches.Count >= 1 Then
For Each match In Matches
ActiveDocument.Hyperlinks.Add Anchor:=objRegExp.match, Address:="http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&adjacent=true&locale=en_EP&CC=$1&NR=$2&KC=$3"
Next
End If
Set Matches = Nothing
Set objRegExp = Nothing
End Sub
Is this VBA or VBScript? In VBScript you cannot declare types like Dim newText As hyperLink, but every variable is a variant, so: Dim newText and nothing more.
objRegEx.Replace returns the string with replacements and needs two parameters passed into it: The original string and the text you want to replace the pattern with:
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.IgnoreCase = False
objRegEx.Pattern = "^(WO|EP|US)([0-9]*)(A1|A2|B1|B2)$"
' assuming plainText contains the text you want to create the hyperlink for
strName = objRegEx.Replace(plainText, "$1$2$3")
strAddress = objRegex.Replace(plainText, "http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&adjacent=true&locale=en_EP&CC=$1&NR=$2&KC=$3"
Now you can use strName and strAddress to create the hyperlink with.
Pro-tip: You can use objRegEx.Test(plainText) to see if the regexp matches anything for early handling of errors.
Problem solved:
Sub addHyperlinkToNumbers()
Dim objRegExp As Object
Dim matchRange As Range
Dim Matches
Dim match
Set objRegExp = CreateObject("VBScript.RegExp")
With objRegExp
.Global = True
.IgnoreCase = False
.Pattern = "(WO|EP|US|FR|DE|GB|NL)([0-9]+)(A1|A2|A3|A4|B1|B2|B3|B4)"
End With
Set Matches = objRegExp.Execute(ActiveDocument.Content)
For Each match In Matches
'This doesn't work, because of the WYSIWYG-model of MS Word:
'Set matchRange = ActiveDocument.Range(match.FirstIndex, match.FirstIndex + Len(match.Value))
Set matchRange = ActiveDocument.Content
With matchRange.Find
.Text = match.Value
.MatchWholeWord = True
.MatchCase = True
.Wrap = wdFindStop
.Execute
End With
ActiveDocument.Hyperlinks.Add Anchor:=matchRange, _
Address:="http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&adjacent=true&locale=en_EP&CC=" _
& match.Submatches(0) & "&NR=" & match.Submatches(1) & "&KC=" & match.Submatches(2)
Next
MsgBox "Hyperlink added to " & Matches.Count & " patent numbers"
Set objRegExp = Nothing
Set matchRange = Nothing
Set Matches = Nothing
Set match = Nothing
End Sub