Visual Basic Regex but no double quotes - regex

i'm using regex to get some information from a website, i have this code:
Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("http://www.startkabel.nl/zoeken/index.php?zoek=" & TextBox1.Text)
Dim response As System.Net.HttpWebResponse = request.GetResponse
Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())
Dim startpagina As String = sr.ReadToEnd
Dim sp As New System.Text.RegularExpressions.Regex("<a href=http://games.startkabel.nl>games.startkabel.nl</a></td>")
Dim matches As MatchCollection = sp.Matches(startpagina)
For Each itemcode As Match In matches
ListBox1.Items.Add(itemcode.Value.Split("""").GetValue(0))
Next
but <a href=http://games.startkabel.nl>games.startkabel.nl</a></td> doesn't have "" so the listbox shows the whole code while I only need this part
games.startkabel.nl
i already tried to change the code to this:
"""games.startkabel.nl""</td>"
but then it doesn't show any result.
Can someone help me with this problem?
(sorry for my bad English)

Are you trying to retrieve the hyperlink URL or the hyperlink name?
itemcode.Value.Split("="c, "<"c, ">"c).GetValue(2)
will return the URL "http://games.startkabel.nl"
itemcode.Value.Split("="c, "<"c, ">"c).GetValue(3)
will return the hyperlink name "games.startkabel.nl"

Related

How can I use code to export a SharePoint list to Excel

I found a previous question and it looks to be what I'm looking for. However when I run the code, I get a debug error (Highlights the last line from "Set ObjMyList . . . . ("A1"))". Below is the code I'm using with the specific path & GUIDs. I tried adjusting the sharepoint address, but the one listed is the one that points to the library. I also tried just the home address (Stopping at "TEP") and all the way to including "All Items.aspx". I'm sure I am missing something "simple", but just thought I'd try to ask here.
Dim objMyList As ListObject
Dim objWksheet As Worksheet
Dim strSPServer As String
Const SERVER As String = "https://twdc.sharepoint.com/sites/WDPR-dclrecruiting/Test/TEP/Trip%20Event%20Planning%20Library"
Const LISTNAME As String = "{6B39FDF1-29AE-418C-9D99-92293FED5C81}"
Const VIEWNAME As String = "{CCFD1C7F-74CA-4921-A599-628C800C818A}"
strSPServer = "http://" & SERVER & "/_vti_bin"
Set objWksheet = Worksheets.Add
Set objMyList = objWksheet.ListObjects.Add(xlSrcExternal, _
Array(strSPServer, LISTNAME, VIEWNAME), False, xlYes, Range("A1"))
Below code works in my local
Sub ExportList()
Dim objWksheet As Worksheet
Dim strSPServer As String
Const SERVER As String = "sp/sites/team"
Const LISTNAME As String = "{3e47ff9c-9aab-4a40-9d6a-c47e9b793484}" 'From source code
Const VIEWNAME As String = "{67709eda-c975-4669-85e5-d95e263dadc6}" 'From source code
' The SharePoint server URL pointing to the SharePoint list to import into Excel.
strSPServer = "http://" & SERVER & "/_vti_bin"
Set objWksheet = Sheets("Sheet1")
' Add a list range to the newly created worksheet
' and populated it with the data from the SharePoint list.
Set objMyList = objWksheet.ListObjects.Add(xlSrcExternal, Array(strSPServer, LISTNAME, VIEWNAME), True, , Range("A1"))
Set objMyList = Nothing
Set objWksheet = Nothing
End Sub

Webscraping with VBA morningstar financial

I'm trying to scrape the inside ownership from Morningstar at this url:
http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR&region=usa&culture=en-US
This is the code I'm using:
Sub test()
Dim appIE As Object
Set appIE = CreateObject("InternetExplorer.Application")
With appIE
.Navigate "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR&region=usa&culture=en-US"
.Visible = True
End With
While appIE.Busy
DoEvents
Wend
Set allRowOfData = appIE.Document.getElementById("currentInsiderVal")
Debug.Print allRowOfData
Dim myValue As String: myValue = allRowOfData.Cells(0).innerHTML
appIE.Quit
Set appIE = Nothing
Range("A30").Value = myValue
End Sub
I get run-time error 13 at line
Set allRowOfData = appIE.Document.getElementById("currentInsiderVal")
but I can't see any mismatch. What is going on?
You can just do it with XHR and RegEx instead of cumbersome IE:
Sub Test()
Dim sContent
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR&region=usa&culture=en-US", False
.Send
sContent = .ResponseText
End With
With CreateObject("VBScript.RegExp")
.Pattern = ",""currInsiderVal"":(.*?),"
Range("A30").Value = .Execute(sContent).Item(0).SubMatches(0)
End With
End Sub
Here is the description how the code works:
First of all MSXML2.XMLHTTP ActiveX instance is created. GET request opened with target URL in synchronous mode (execution interrupts until response received).
Then VBScript.RegExp is created. By default .IgnoreCase, .Global and .MultiLine properties are False. The pattern is ,"currInsiderVal":(.*?),, where (.*?) is a capturing group, . means any character, .* - zero or more characters, .*? - as few as possible characters (lazy matching). Other characters in pattern to be found as is. .Execute method returns a collection of matches, there is only one match object in it since .Global is False. This match object has a collection of submatches, there is only one submatch in it since the pattern contains the only capturing group.There are some helpful MSDN articles on regex:
Microsoft Beefs Up VBScript with Regular Expressions
Introduction to Regular Expressions
Here is the description how I created the code:
First I found an element containing the target value on the webpage DOM using browser:
The corresponding node is:
<td align="right" id="currrentInsiderVal">143.51</td>
Then I made XHR and found this node in the response HTML, but it didn't contain the value (you can find response in the browser developer tools on network tab after you refresh the page):
<td align="right" id="currrentInsiderVal">
</td>
Such behavior is typical for DHTML. Dynamic HTML content is generated by scripts after the webpage loaded, either after retrieving a data from web via XHR or just processing already loaded withing webpage data. Then I just searched for the value 143.51 in the response, the snippet ,"currInsiderVal":143.51, located within JS function:
fundsArr = {"fundTotalHistVal":132.61,"mutualFunds":[[1,89,"#a71620"],[2,145,"#a71620"],[3,152,"#a71620"],[4,198,"#a71620"],[5,155,"#a71620"],[6,146,"#a71620"],[7,146,"#a71620"],[8,132,"#a71620"]],"insiderHisMaxVal":3.535,"institutions":[[1,273,"#283862"],[2,318,"#283862"],[3,351,"#283862"],[4,369,"#283862"],[5,311,"#283862"],[6,298,"#283862"],[7,274,"#283862"],[8,263,"#283862"]],"currFundData":[2,2202,"#a6001d"],"currInstData":[1,4370,"#283864"],"instHistMaxVal":369,"insiders":[[5,0.042,"#ff6c21"],[6,0.057,"#ff6c21"],[7,0.057,"#ff6c21"],[8,3.535,"#ff6c21"],[5,0],[6,0],[7,0],[8,0]],"currMax":4370,"histLineQuars":[[1,"Q2"],[2,"Q3"],[3,"Q4"],[4,"Q1<br>2015"],[5,"Q2"],[6,"Q3"],[7,"Q4"],[8,"Q1<br>2016"]],"fundHisMaxVal":198,"currInsiderData":[3,143,"#ff6900"],"currFundVal":2202.85,"quarters":[[1,"Q2"],[2,""],[3,""],[4,"Q1<br>2015"],[5,""],[6,""],[7,""],[8,"Q1<br>2016"]],"insiderTotalHistVal":3.54,"currInstVal":4370.46,"currInsiderVal":143.51,"use10YearData":"false","instTotalHistVal":263.74,"maxValue":369};
So the regex pattern created based on that it should find the snippet ,"currInsiderVal":<some text>, where <some text> is our target value.
Had a look on the site and the element you are trying to retrieve has a typo in it; instead of currentInsiderVal try using currrentInsiderVal and you should retrieve the data correctly.
Probably worth considering some error trapping to catch stuff like this for any other fields you retrieve?
After your comment I took a closer look. Your issue seemed like it was trying to trap the id of the individual cell rather than navigating down the object tree. I've modified the code to retrieve the row of the table you are after and then set myValue to be the correct cell within that row. Seemed to be working when I tried it out. Give this a shot?
Sub test()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR&region=usa&culture=en-US"
.Visible = True
End With
While appIE.Busy
DoEvents
Wend
Set allRowOfData = appIE.Document.getelementbyID("tableTest").getElementsByTagName("tbody")(0).getElementsByTagName("tr")(5)
myValue = allRowOfData.Cells(2).innerHTML
appIE.Quit
Set appIE = Nothing
Range("A30").Value = myValue
End Sub

vb.net Regex remove a tags with mailto

I have a text for example:
" Visit www.flexstaff.com for details
Email rachel#flexstaff.com apply online."
I would like to delete only the a tags that contain "mailto" so
rachel#flexstaff.com will become
rachel#flexstaff.com
I have this regex:
Dim rgxMailTo = New Regex("<a\b\s[^<>]*(?<=#.*)>|(?<=#.*)</a>",RegexOptions.IgnoreCase)
Dim ret As String = rgxMailTo.Replace(text, Environment.NewLine)
But it selects other a tags as well.
Use the below regex and then replace the match with $1.
<a\b\s*[^<>]*\bmailto\b[^<>]*>([^<>]*)<\/a>
DEMO
To select only the tags.
<a\b\s*[^<>]*\bmailto\b[^<>]*>|(?<=<a\b\s*[^<>]*\bmailto\b[^<>]*>[^<>]*)<\/a>
If your text is of uncertain source (so it was not all generated in 100% predictable way), using regex is a very bad idea - trust me, I've been there.
One option is to use Html Agility Pack, and load the HTML as an XElement (C#, as I have sample on hand):
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(HTML);
htmlDoc.OptionOutputAsXml = true;
using (var stream = new MemoryStream())
{
htmlDoc.Save(stream);
stream.Position = 0;
var xelement = XElement.Load(stream);
DoStuffToXElement(xelement);
}
Note, that in case you have just a fragment without a root element:
Link
<img src="#"/>
Remember to wrap it in something neutral, like htmlDoc.LoadHtml("<div>"+HTML+"</div>");
Now you can use LinqToXml to find whatever you need, traverse the tree or do anything quite safely:
xHtml
.Descendants()
.Where(e=>e.Name.LocalName.Equals("a", StringComparison.OrdinalIgnoreCase)
&& e.Attribute("href") != null
&& e.Attribute("href").Value.StartsWith("mailto:", StringComparison.OrdinalIgnoreCase))
.Remove();
Final note: this is nearly always much slower than regex - if time is important (for example you do it at every page load or sth) it might be too slow, but I guess this kind of processing can be done beforehand?
You can use the power of LINQ to XML like this:
Imports System.Text.RegularExpressions
Imports System.Xml.Linq
Imports System.Xml
Imports System.Xml.XPath
Public Class Form1
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim str As String = "Visit www.flexstaff.com for details\nEmail rachel#flexstaff.com apply online."
Dim xDoc As XDocument = XDocument.Parse("<?xml version= '1.0'?><root>" + str + "</root>")
Dim query = xDoc.XPathSelectElements("//a[contains(#href,'mailto')]")
For Each element In query
element.Remove()
Next element
Dim Res As String = xDoc.ToString().Replace("<root>", String.Empty).Replace("</root>", String.Empty)
End Sub
End Class
Outoput (Res):
Visit www.flexstaff.com for details\nEmail apply online.

Extract multiple email in a single Outlook message to Excel?

I need a macro in Outlook that extract all the email address in the outlook message then post it in excel.
The following code only extracts the very 1st email address it finds in the body.
My desired output should be:
adam.peters#sample.com
adam.dryburgh#sample.com
amy.norton#sample.com
My sample email is:
Delivery has failed to these recipients or groups:
adam.peters#sample.com The e-mail address you entered couldn't be
found. Please check the recipient's e-mail address and try to resend
the message. If the problem continues, please contact your helpdesk.
adam.dryburgh#sample.com The e-mail address you entered couldn't be
found. Please check the recipient's e-mail address and try to resend
the message. If the problem continues, please contact your helpdesk.
amy.norton#sample.com The e-mail address you entered couldn't be
found. Please check the recipient's e-mail address and try to resend
the message. If the problem continues, please contact your helpdesk.
The following organization rejected your message:
mx2.dlapiper.iphmx.com.
code:
Sub Extract_Invalid_To_Excel()
Dim olApp As Outlook.Application
Dim olExp As Outlook.Explorer
Dim olFolder As Outlook.MAPIFolder
Dim obj As Object
Dim stremBody As String
Dim stremSubject As String
Dim i As Long
Dim x As Long
Dim count As Long
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
Dim xlApp As Object 'Excel.Application
Dim xlwkbk As Object 'Excel.Workbook
Dim xlwksht As Object 'Excel.Worksheet
Dim xlRng As Object 'Excel.Range
Set olApp = Outlook.Application
Set olExp = olApp.ActiveExplorer
Set olFolder = olExp.CurrentFolder
'Open Excel
Set xlApp = GetExcelApp
xlApp.Visible = True
If xlApp Is Nothing Then GoTo ExitProc
Set xlwkbk = xlApp.workbooks.Add
Set xlwksht = xlwkbk.Sheets(1)
Set xlRng = xlwksht.Range("A1")
xlRng.Value = "Bounced email addresses"
'Set count of email objects
count = olFolder.Items.count
'counter for excel sheet
i = 0
'counter for emails
x = 1
For Each obj In olFolder.Items
xlApp.StatusBar = x & " of " & count & " emails completed"
stremBody = obj.Body
stremSubject = obj.Subject
'Check for keywords in email before extracting address
If checkEmail(stremBody) = True Then
'MsgBox ("finding email: " & stremBody)
RegEx.Pattern = "\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b"
RegEx.IgnoreCase = True
RegEx.MultiLine = True
Set olMatches = RegEx.Execute(stremBody)
For Each match In olMatches
xlwksht.cells(i + 2, 1).Value = match
i = i + 1
Next match
'TODO move or mark the email that had the address extracted
Else
'To view the items that aren't being parsed uncomment the following line
'MsgBox (stremBody)
End If
x = x + 1
Next obj
xlApp.ScreenUpdating = True
MsgBox ("Invalid Email addresses are done being extracted")
ExitProc:
Set xlRng = Nothing
Set xlwksht = Nothing
Set xlwkbk = Nothing
Set xlApp = Nothing
Set emItm = Nothing
Set olFolder = Nothing
Set olNS = Nothing
Set olApp = Nothing
End Sub
Function GetExcelApp() As Object
' always create new instance
On Error Resume Next
Set GetExcelApp = CreateObject("Excel.Application")
On Error GoTo 0
End Function
untested
replace
RegEx.Pattern = "\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b"
RegEx.IgnoreCase = True
RegEx.MultiLine = True
with
RegEx.Pattern = "\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b"
RegEx.IgnoreCase = True
RegEx.MultiLine = True
RegEx.Global = True
I have noticed the following line of code:
Set olApp = Outlook.Application
If you run the code in Outlook, you need to use the Application property to get an instance of the Application class. Or you need to use the New operator to create a new instance, for example:
Set ol = New Outlook.Application
or
Set objOL = CreateObject("Outlook.Application")
See How to automate Outlook from another program for more information.
You may also consider using the Word object model for working with item bodies. The WordEditor property of the Inspector class returns an instance of the Document class which represents the message body. See Chapter 17: Working with Item Bodies for more information.

How to use regular expression in WatiN

I'm working on WatiN automation tool. I'm having problem in regular expression. I've situation where i have to enter some text and click on a button in the popup window. I'm using AttachToIE method and URL attribute("http://192.168.25.10:215/admin/SelectUsers.aspx?Type=FeedbackID=ef5ad7ef5490-4656-9669-32464aeba7cd") of the popup to attach to the popup.
The problem is each time the popup appears the ID value in the URL changes. So i'm not able to access the popup. can anyone plz help with this by giving me Regular Expression for the changing value of ID in the below URL
("http://192.168.25.10:215/admin/SelectUsers.aspx?Type=FeedbackID=ef5ad7ef5490-4656-9669-32464aeba7cd")
thanking you
It appears that you have a URL with 2 query string parameters Type and ID and your pattern is:
"http://192.168.25.10:215/admin/SelectUsers.aspx?Type=Feedback&ID={some id}"
You can use the Find.ByUrl() attribute constraint method and pass it to AttachToIE() as shown below with the regex for matching that pattern.
string url = "http://192.168.25.10:215/admin/SelectUsers.aspx?Type=Feedback&ID="
Regex regex = new Regex(url + "[a-z0-9]+", RegexOptions.IgnoreCase);
IE ie = IE.AttachToIE(Find.ByUrl(regex));
string baseUrl ="http://192.168.25.10:215/admin/SelectUsers.aspx?Type=FeedbackID="
Regex urlIE= new Regex(baseUrl + "[\\wd]+", RegexOptions.IgnoreCase);
IE ie = IE.AttachToIE(Find.ByUrl(urlIE);
I'm not familiar with WatiN but it looks like it's runs on .Net so perhaps this might help?
var desiredId = "000000000000-0000-0000-000000000000";
var url = "http://192.168.25.10:215/admin/SelectUsers.aspx?Type=FeedbackID=ef5ad7ef5490-4656-9669-32464aeba7cd&someMoreStuff";
var pattern = #"(?i)(?<=FeedBackId=)[-a-z0-9]+";
var result = Regex.Replace(url, pattern, desiredId);
Console.WriteLine(result);
//Output: http://192.168.25.10:215/admin/SelectUsers.aspx?Type=FeedbackID=000000000000-0000-0000-000000000000&someMoreStuff
The following pattern should have the same affect but is more defensive. It should only match stuff in the query string, it requires the id to be 35 characters and won't match similar parameter names like "PreviousFeedBackId".
var pattern = #"(?i)(?<=\?.*\bFeedBackId=)[-a-z0-9]{35,35}\b";
If you just want to extract the id:
var id = Regex.Match(url, pattern).Value;
Console.WriteLine(id);
//output: ef5ad7ef5490-4656-9669-32464aeba7cd
WatiN has a feature where in we can use the url by neglecting the query string. Below is the code which is working fine for me.
string baseUrl = "http://192.168.25.10:215/admin/SelectUsers.aspx";
IE ie = IE.AttachToIE(Find.ByUrl(baseUrl,true));