I have an XSLT that I want to test for processing XML data through MSXML2 in VBA.
The usual use of this XSLT is to process XML data from MicroStation or OpenRoads Designer which supplies these parameters according to settings in the report browser.
In my case, as I'm trying to run some of the report XML through MSXML2, I find that this error comes up:
A reference to variable or parameter 'xslStationPrecision' cannot be resolved. The variable or parameter may not be defined, or it may not be in scope.
This is true even as I'm repurposing code from Microsoft's own documentation here: https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms762312(v=vs.85)
To be specific, here's my code block that duplicates the sample code:
Sub test_xsl_from_ms_learn()
' from https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms762312(v=vs.85)
Dim xslt As New MSXML2.XSLTemplate60
Dim xsldoc As New MSXML2.FreeThreadedDOMDocument60
Dim xslproc
Dim PathToXslFile As String
PathToXslFile = "my_own_custom.xsl"
xsldoc.async = False
xsldoc.Load (PathToXslFile)
If xsldoc.parseError.ErrorCode <> 0 Then
Debug.Print xsldoc.parseError.reason
Else
On Error Resume Next
Set xslt.stylesheet = xsldoc
Debug.Print Err.Number, Err.Description
'this is where I get the error
'if not for this error, I would advance through to where I would use addParameter
On Error GoTo 0
End If
End Sub
How can I use addParameter before this error?
I found the solution. First, through testing I found that a reference within the XSLT file covers the gap with the parameters, so I needed to resolve external references.
But after resolving that, I got a new error "Security settings do not allow the execution of script code within this stylesheet." Through a bit of searching, I found the answer from Microsoft here: https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms763800(v=vs.85)?redirectedfrom=MSDN
So the following lines are needed before Set xslt.stylesheet:
xsldoc.resolveExternals = True
'this wasn't mentioned in the Learn code, but it is essential to resolve this error:
'A reference to variable or parameter 'xslStationPrecision' cannot be resolved. The variable or parameter may not be defined, or it may not be in scope.
xsldoc.SetProperty "AllowXsltScript", True
'to avoid this error for having <msxsl:script>:
'Security settings do not allow the execution of script code within this stylesheet.
In the end, this code block loads properly:
Sub test_xsl_from_ms_learn()
' from https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms762312(v=vs.85)
Dim xslt As New MSXML2.XSLTemplate60
Dim xsldoc As New MSXML2.FreeThreadedDOMDocument60
Dim xslproc
Dim PathToXslFile As String
PathToXslFile = "my_own_custom.xsl"
xsldoc.async = False
xsldoc.Load (PathToXslFile)
If xsldoc.parseError.ErrorCode <> 0 Then
Debug.Print xsldoc.parseError.reason
Else
xsldoc.resolveExternals = True
xsldoc.SetProperty "AllowXsltScript", True
On Error Resume Next
Set xslt.stylesheet = xsldoc
Debug.Print Err.Number, Err.Description
On Error GoTo 0
End If
End Sub
I used to be quite proficient in VBA with excel, but I'm currently trying to do something with Google Scripts and I am well and truly stuck.
Basically, I am trying like to extract data out of a standardised email from Gmail into a Google sheet. There are a couple of other threads on the subject which I have consulted so far, and I can get the body of the email into the sheet but cannot parse it.
I am new to regex, but it tests OK on regex101
I am also brand new to Google Script, and even the debugger seems to have stopped working now (it did before, so would be grateful if anyone can suggest why this is).
Here is my basic function:
function processInboxToSheet() {
var label = GmailApp.getUserLabelByName("NEWNOPS");
var threads = label.getThreads();
// Set destination sheet
var sheet = SpreadsheetApp.getActiveSheet();
// Get all emails labelled NEWNOPS
for (var i = 0; i < threads.length; i++) {
var tmp,
message = threads[i].getMessages()[1], // second message in thread
content = message.getPlainBody(); // remove html markup
if (content) {
// search email for 'of:' and capure next line of text as address
// tests OK at regex101.com
property = content.match(/of:[\n]([^\r\n]*)[\r\n]/);
// if no match, display error
var property = (tmp && tmp[1]) ? tmp[1].trim() : 'No property';
sheet.appendRow([property]);
} // End if
// remove label to avoid duplication
threads[i].removeLabel(label)
} // End for loop
}
I can append 'content' to the sheet Ok, but cannot extract the address text required by the regex. Content displays as follows:
NOPS for the purchase of:
123 Any Street, Anytown, AN1 1AN
DATE: 05/05/2017
PRICE: £241,000
Seller’s Details
NAME: Mrs Seller
Thanks for reading :)
The return value of .match() is an array. The first captured group, containing the address, will be at index 1.
Based on the following line after your call to .match(), it looks like the tmp variable should have been assigned that array, not the property variable.
var property = (tmp && tmp[1]) ? tmp[1].trim() : 'No property';
That line says, if .match() returned something that isn't null and has a value at index 1, then trim that value and assign to property, otherwise assign it the string 'No property'.
So, try changing this line:
property = content.match(/of:[\n]([^\r\n]*)[\r\n]/);
To this:
tmp = content.match(/of:[\n]([^\r\n]*)[\r\n]/);
Thanks Kevin, I think I must have changed it while debugging.
The problem was with my regexp in the end. After a bit of trial and error the following worked:
tmp = content.match(/of:[\r\n]+([^\r\n]+)/);
I'm trying to scrape the inside ownership from Morningstar at this url:
http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR®ion=usa&culture=en-US
This is the code I'm using:
Sub test()
Dim appIE As Object
Set appIE = CreateObject("InternetExplorer.Application")
With appIE
.Navigate "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR®ion=usa&culture=en-US"
.Visible = True
End With
While appIE.Busy
DoEvents
Wend
Set allRowOfData = appIE.Document.getElementById("currentInsiderVal")
Debug.Print allRowOfData
Dim myValue As String: myValue = allRowOfData.Cells(0).innerHTML
appIE.Quit
Set appIE = Nothing
Range("A30").Value = myValue
End Sub
I get run-time error 13 at line
Set allRowOfData = appIE.Document.getElementById("currentInsiderVal")
but I can't see any mismatch. What is going on?
You can just do it with XHR and RegEx instead of cumbersome IE:
Sub Test()
Dim sContent
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR®ion=usa&culture=en-US", False
.Send
sContent = .ResponseText
End With
With CreateObject("VBScript.RegExp")
.Pattern = ",""currInsiderVal"":(.*?),"
Range("A30").Value = .Execute(sContent).Item(0).SubMatches(0)
End With
End Sub
Here is the description how the code works:
First of all MSXML2.XMLHTTP ActiveX instance is created. GET request opened with target URL in synchronous mode (execution interrupts until response received).
Then VBScript.RegExp is created. By default .IgnoreCase, .Global and .MultiLine properties are False. The pattern is ,"currInsiderVal":(.*?),, where (.*?) is a capturing group, . means any character, .* - zero or more characters, .*? - as few as possible characters (lazy matching). Other characters in pattern to be found as is. .Execute method returns a collection of matches, there is only one match object in it since .Global is False. This match object has a collection of submatches, there is only one submatch in it since the pattern contains the only capturing group.There are some helpful MSDN articles on regex:
Microsoft Beefs Up VBScript with Regular Expressions
Introduction to Regular Expressions
Here is the description how I created the code:
First I found an element containing the target value on the webpage DOM using browser:
The corresponding node is:
<td align="right" id="currrentInsiderVal">143.51</td>
Then I made XHR and found this node in the response HTML, but it didn't contain the value (you can find response in the browser developer tools on network tab after you refresh the page):
<td align="right" id="currrentInsiderVal">
</td>
Such behavior is typical for DHTML. Dynamic HTML content is generated by scripts after the webpage loaded, either after retrieving a data from web via XHR or just processing already loaded withing webpage data. Then I just searched for the value 143.51 in the response, the snippet ,"currInsiderVal":143.51, located within JS function:
fundsArr = {"fundTotalHistVal":132.61,"mutualFunds":[[1,89,"#a71620"],[2,145,"#a71620"],[3,152,"#a71620"],[4,198,"#a71620"],[5,155,"#a71620"],[6,146,"#a71620"],[7,146,"#a71620"],[8,132,"#a71620"]],"insiderHisMaxVal":3.535,"institutions":[[1,273,"#283862"],[2,318,"#283862"],[3,351,"#283862"],[4,369,"#283862"],[5,311,"#283862"],[6,298,"#283862"],[7,274,"#283862"],[8,263,"#283862"]],"currFundData":[2,2202,"#a6001d"],"currInstData":[1,4370,"#283864"],"instHistMaxVal":369,"insiders":[[5,0.042,"#ff6c21"],[6,0.057,"#ff6c21"],[7,0.057,"#ff6c21"],[8,3.535,"#ff6c21"],[5,0],[6,0],[7,0],[8,0]],"currMax":4370,"histLineQuars":[[1,"Q2"],[2,"Q3"],[3,"Q4"],[4,"Q1<br>2015"],[5,"Q2"],[6,"Q3"],[7,"Q4"],[8,"Q1<br>2016"]],"fundHisMaxVal":198,"currInsiderData":[3,143,"#ff6900"],"currFundVal":2202.85,"quarters":[[1,"Q2"],[2,""],[3,""],[4,"Q1<br>2015"],[5,""],[6,""],[7,""],[8,"Q1<br>2016"]],"insiderTotalHistVal":3.54,"currInstVal":4370.46,"currInsiderVal":143.51,"use10YearData":"false","instTotalHistVal":263.74,"maxValue":369};
So the regex pattern created based on that it should find the snippet ,"currInsiderVal":<some text>, where <some text> is our target value.
Had a look on the site and the element you are trying to retrieve has a typo in it; instead of currentInsiderVal try using currrentInsiderVal and you should retrieve the data correctly.
Probably worth considering some error trapping to catch stuff like this for any other fields you retrieve?
After your comment I took a closer look. Your issue seemed like it was trying to trap the id of the individual cell rather than navigating down the object tree. I've modified the code to retrieve the row of the table you are after and then set myValue to be the correct cell within that row. Seemed to be working when I tried it out. Give this a shot?
Sub test()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR®ion=usa&culture=en-US"
.Visible = True
End With
While appIE.Busy
DoEvents
Wend
Set allRowOfData = appIE.Document.getelementbyID("tableTest").getElementsByTagName("tbody")(0).getElementsByTagName("tr")(5)
myValue = allRowOfData.Cells(2).innerHTML
appIE.Quit
Set appIE = Nothing
Range("A30").Value = myValue
End Sub
I have a text for example:
" Visit www.flexstaff.com for details
Email rachel#flexstaff.com apply online."
I would like to delete only the a tags that contain "mailto" so
rachel#flexstaff.com will become
rachel#flexstaff.com
I have this regex:
Dim rgxMailTo = New Regex("<a\b\s[^<>]*(?<=#.*)>|(?<=#.*)</a>",RegexOptions.IgnoreCase)
Dim ret As String = rgxMailTo.Replace(text, Environment.NewLine)
But it selects other a tags as well.
Use the below regex and then replace the match with $1.
<a\b\s*[^<>]*\bmailto\b[^<>]*>([^<>]*)<\/a>
DEMO
To select only the tags.
<a\b\s*[^<>]*\bmailto\b[^<>]*>|(?<=<a\b\s*[^<>]*\bmailto\b[^<>]*>[^<>]*)<\/a>
If your text is of uncertain source (so it was not all generated in 100% predictable way), using regex is a very bad idea - trust me, I've been there.
One option is to use Html Agility Pack, and load the HTML as an XElement (C#, as I have sample on hand):
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(HTML);
htmlDoc.OptionOutputAsXml = true;
using (var stream = new MemoryStream())
{
htmlDoc.Save(stream);
stream.Position = 0;
var xelement = XElement.Load(stream);
DoStuffToXElement(xelement);
}
Note, that in case you have just a fragment without a root element:
Link
<img src="#"/>
Remember to wrap it in something neutral, like htmlDoc.LoadHtml("<div>"+HTML+"</div>");
Now you can use LinqToXml to find whatever you need, traverse the tree or do anything quite safely:
xHtml
.Descendants()
.Where(e=>e.Name.LocalName.Equals("a", StringComparison.OrdinalIgnoreCase)
&& e.Attribute("href") != null
&& e.Attribute("href").Value.StartsWith("mailto:", StringComparison.OrdinalIgnoreCase))
.Remove();
Final note: this is nearly always much slower than regex - if time is important (for example you do it at every page load or sth) it might be too slow, but I guess this kind of processing can be done beforehand?
You can use the power of LINQ to XML like this:
Imports System.Text.RegularExpressions
Imports System.Xml.Linq
Imports System.Xml
Imports System.Xml.XPath
Public Class Form1
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim str As String = "Visit www.flexstaff.com for details\nEmail rachel#flexstaff.com apply online."
Dim xDoc As XDocument = XDocument.Parse("<?xml version= '1.0'?><root>" + str + "</root>")
Dim query = xDoc.XPathSelectElements("//a[contains(#href,'mailto')]")
For Each element In query
element.Remove()
Next element
Dim Res As String = xDoc.ToString().Replace("<root>", String.Empty).Replace("</root>", String.Empty)
End Sub
End Class
Outoput (Res):
Visit www.flexstaff.com for details\nEmail apply online.
I need a macro in Outlook that extract all the email address in the outlook message then post it in excel.
The following code only extracts the very 1st email address it finds in the body.
My desired output should be:
adam.peters#sample.com
adam.dryburgh#sample.com
amy.norton#sample.com
My sample email is:
Delivery has failed to these recipients or groups:
adam.peters#sample.com The e-mail address you entered couldn't be
found. Please check the recipient's e-mail address and try to resend
the message. If the problem continues, please contact your helpdesk.
adam.dryburgh#sample.com The e-mail address you entered couldn't be
found. Please check the recipient's e-mail address and try to resend
the message. If the problem continues, please contact your helpdesk.
amy.norton#sample.com The e-mail address you entered couldn't be
found. Please check the recipient's e-mail address and try to resend
the message. If the problem continues, please contact your helpdesk.
The following organization rejected your message:
mx2.dlapiper.iphmx.com.
code:
Sub Extract_Invalid_To_Excel()
Dim olApp As Outlook.Application
Dim olExp As Outlook.Explorer
Dim olFolder As Outlook.MAPIFolder
Dim obj As Object
Dim stremBody As String
Dim stremSubject As String
Dim i As Long
Dim x As Long
Dim count As Long
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
Dim xlApp As Object 'Excel.Application
Dim xlwkbk As Object 'Excel.Workbook
Dim xlwksht As Object 'Excel.Worksheet
Dim xlRng As Object 'Excel.Range
Set olApp = Outlook.Application
Set olExp = olApp.ActiveExplorer
Set olFolder = olExp.CurrentFolder
'Open Excel
Set xlApp = GetExcelApp
xlApp.Visible = True
If xlApp Is Nothing Then GoTo ExitProc
Set xlwkbk = xlApp.workbooks.Add
Set xlwksht = xlwkbk.Sheets(1)
Set xlRng = xlwksht.Range("A1")
xlRng.Value = "Bounced email addresses"
'Set count of email objects
count = olFolder.Items.count
'counter for excel sheet
i = 0
'counter for emails
x = 1
For Each obj In olFolder.Items
xlApp.StatusBar = x & " of " & count & " emails completed"
stremBody = obj.Body
stremSubject = obj.Subject
'Check for keywords in email before extracting address
If checkEmail(stremBody) = True Then
'MsgBox ("finding email: " & stremBody)
RegEx.Pattern = "\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b"
RegEx.IgnoreCase = True
RegEx.MultiLine = True
Set olMatches = RegEx.Execute(stremBody)
For Each match In olMatches
xlwksht.cells(i + 2, 1).Value = match
i = i + 1
Next match
'TODO move or mark the email that had the address extracted
Else
'To view the items that aren't being parsed uncomment the following line
'MsgBox (stremBody)
End If
x = x + 1
Next obj
xlApp.ScreenUpdating = True
MsgBox ("Invalid Email addresses are done being extracted")
ExitProc:
Set xlRng = Nothing
Set xlwksht = Nothing
Set xlwkbk = Nothing
Set xlApp = Nothing
Set emItm = Nothing
Set olFolder = Nothing
Set olNS = Nothing
Set olApp = Nothing
End Sub
Function GetExcelApp() As Object
' always create new instance
On Error Resume Next
Set GetExcelApp = CreateObject("Excel.Application")
On Error GoTo 0
End Function
untested
replace
RegEx.Pattern = "\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b"
RegEx.IgnoreCase = True
RegEx.MultiLine = True
with
RegEx.Pattern = "\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b"
RegEx.IgnoreCase = True
RegEx.MultiLine = True
RegEx.Global = True
I have noticed the following line of code:
Set olApp = Outlook.Application
If you run the code in Outlook, you need to use the Application property to get an instance of the Application class. Or you need to use the New operator to create a new instance, for example:
Set ol = New Outlook.Application
or
Set objOL = CreateObject("Outlook.Application")
See How to automate Outlook from another program for more information.
You may also consider using the Word object model for working with item bodies. The WordEditor property of the Inspector class returns an instance of the Document class which represents the message body. See Chapter 17: Working with Item Bodies for more information.