I have some HTML source that i get from a website for option quotes. (please see below)
What is the best way to extract the various text values in tr and store in a collection based on the strike price (4700 in this case available in the mid td 4700.00)
Some people recommend regex while other suggest to use a html parser. I'm doing this in VBA so whats the best way?
<!--<td>Quote</td>
<td><img src="/images/print3.gif">
</td>-->
<td><img src="/live_market/resources/images/grficon.gif" /></td>
<td class="ylwbg"> 2,935,500</td>
<td class="ylwbg"> 27,550</td>
<td class="ylwbg"> 12,458</td>
<td class="ylwbg"> 23.79</td>
<!-- End-->
<td class="ylwbg">
139.25
</td>
<!--*Net Change*-->
<td class="ylwbg" Style="color:Red;"> -7.35</td>
<td class="ylwbg"> 200</td>
<td class="ylwbg"> 139.15</td>
<td class="ylwbg"> 142.45</td>
<td class="ylwbg"> 200</td>
<td class="grybg"><b>4700.00</b></td>
<td class="nobg"> 1,300</td>
<td class="nobg"> 76.00</td>
<td class="nobg"> 79.00</td>
<td class="nobg"> 1,350</td>
<!--*Net Change*-->
<td class="nobg" Style="color:Red;"> -1.55</td>
<td class="nobg">
<!-- 76.00 -->
76.00
</td>
<td class="nobg"> 26.33</td>
<td class="nobg"> 32,772</td>
<td class="nobg"> 103,700</td>
<td class="nobg"> 5,123,300</td>
<td><img src="/live_market/resources/images/grficon.gif" /></td>
<!--<td>Quote</td>
<td><img src="/images/print3.gif"></td>-->
</tr>
After some fiddling I have derived a regex/VBA solution using
XMLHTTP to access the site (change strSite to suit)
a Regexp to get the required numbers
a variant array with 20 records to hold, then dump the numbers to the active sheet
Looking at the source HTML to find Regex patterns
The Call options have a common starting and finishing string that delimit the 10 values, but there are three different strings
Strings 1-4,7-10 for each record match <td class="ylwbg">X</td>
String 6 has a Style (and other text) preceding the > before the X
String 5 contains a much longer <a href textX</a>
A regex of
.Pattern = "(<tdclass=""ylwbg"")(Style.+?){0,1}>(.+?)(<\/td>)"
extracts all the needed strings, but further work is needed later on string 5
The Put options start with <td class="nobg" so these are happily not extracted by a regex that gets points 1-3
Actual Code
Sub GetTxt()
Dim objXmlHTTP As Object
Dim objRegex As Object
Dim objRegMC As Object
Dim objRegM As Object
Dim strResponse As String
Dim strSite As String
Dim lngCnt As Long
Dim strTemp As String
Dim X(1 To 20, 1 To 10)
X(1, 1) = "OI"
X(1, 2) = "Chng in vol"
X(1, 3) = "Volume"
X(1, 4) = "IV"
X(1, 5) = "LTP"
X(1, 6) = "Net Chg"
X(1, 7) = "Bid Qty"
X(1, 8) = "Bid Price"
X(1, 9) = "Ask Price"
X(1, 10) = "Ask Qnty"
Set objXmlHTTP = CreateObject("MSXML2.XMLHTTP")
strSite = "http://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionDates.jsp?symbol=NIFTY&instrument=OPTIDX&strike=4700.00"
On Error GoTo ErrHandler
With objXmlHTTP
.Open "GET", strSite, False
.Send
If .Status = 200 Then strResponse = .ResponseText
End With
On Error GoTo 0
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
'*cleaning regex* to remove all spaces
.Pattern = "[\xA0\s]+"
.Global = True
strResponse = .Replace(strResponse, vbNullString)
.Pattern = "(<tdclass=""ylwbg"")(Style.+?){0,1}>(.+?)(<\/td>)"
If .Test(strResponse) Then
lngCnt = 20
Set objRegMC = .Execute(strResponse)
For Each objRegM In objRegMC
lngCnt = lngCnt + 1
If Right$(objRegM.submatches(2), 2) <> "a>" Then
X(Int((lngCnt - 1) / 10), IIf(lngCnt Mod 10 > 0, lngCnt Mod 10, 10)) = objRegM.submatches(2)
Else
'Get submatches of the form 206.40
strTemp = Val(Right(objRegM.submatches(2), Len(objRegM.submatches(2)) - InStrRev(objRegM.submatches(2), """") - 1))
X(Int((lngCnt - 1) / 10), IIf(lngCnt Mod 10 > 0, lngCnt Mod 10, 10)) = strTemp
End If
Next
Else
MsgBox "Parsing unsuccessful", vbCritical
End If
End With
Set objRegex = Nothing
Set objXmlHTTP = Nothing
[a1].Resize(UBound(X, 1), UBound(X, 2)) = X
Exit Sub
ErrHandler:
MsgBox "Site not accessible"
If Not objXmlHTTP Is Nothing Then Set objXmlHTTP = Nothing
End Sub
Related
I made a ans ASP on wich a user can enter a number, one or more characters, and a word. Then he can press the button and it replaces in the given characters in the word with the given numbers.
My question is how can I let the script replace single characters when the user entered more?
i.e. Entry is "ab" the word is "abby", and the number is "1", my current program makes "1by", but I want to make it "111y", how do I realize that?
<html>
<head>
<meta charset = "utf-8">
<title>Replace
</title>
</head>
<body>
<%
response.flush
l_zahl = request.querystring("f_zahl")
l_wort = request.querystring("f_wort")
Dim letterarray, l_letter
l_letter = request.querystring("f_letter")
letterarry = Split("l_letter")
If IsNumeric(request.querystring("f_zahl")) And Not IsNumeric(request.querystring("f_letter")) And Not IsNumeric(request.querystring("f_wort")) Then
Dim zahlarray, l_zahl
l_zahl = request.querystring("f_zahl")
zahlarry = Split("l_zahl")
Dim wortarray, l_wort
l_wort = request.querystring("f_wort")
wortarry = Split("l_wort")
l_replace = (Replace(l_wort, l_letter, l_zahl, 1, -1, 1))
ElseIf Not IsNumeric(request.querystring("f_zahl")) Then
l_replace = "Keine Zahl"
ElseIf IsNumeric(request.querystring("f_letter")) Then
l_replace = "Kein Buchstabe"
ElseIf IsNumeric(request.querystring("f_wort")) Then
l_replace = "Kein leetspeak"
End If
%>
<form action = "Replacer.asp" method = "get">
<table width = "800" heigth = "400" border="1" cellspacing="0" cellpadding="1" align = "center" font face="tahoma, arial, helvetica, sans-serif" >
<tr>
<td align = "left" width = "100">
Bitte Zahl eingeben
</td>
<td align = "left" width = "100">
<input type = "text" name = "f_zahl" value = "<%=l_zahl%>">
</td>
<td align = "left" width = "100">
Bitte Buchstabe eingeben
</td>
<td>
<input type = "text" name = "f_letter" value = "<%=l_letter%>">
</td>
<td>
Bitte ein Wort eingeben
</td>
<td align = "left" width = "100">
<input type = "text" name = "f_wort" value = "<%=l_wort%>">
</td>
<td width = "*">
<input type = "submit" value = "Ersetzen" \>
</td>
</tr>
<tr>
<td colspan = "2">
</td>
<td >
Verändertes Wort
</td>
<td colspan = "4">
<%=l_replace%>
</td>
</tr>
</table>
</body>
</html>
The Replace function replaces the search string (l_letter) in the given expression (here l_wort) with the replacement string (l_zahl). To replace all characters in l_letter with l_zahl you need to do the replacement in a loop for each character in l_letter. However, the Split function doesn't allow you to split a string into an array of its characters. It splits a string at a given delimiter character (space by default). Calling Split on a variable without a space will give you an array with just a single field containing the original string. Also, VBScript doesn't expand variables in strings, so if you put variable in double quotes you'll get the literal string "variable", not a string with the value of the variable.
var = "ab" : Split("var") ⇒ [ "var" ]
var = "ab" : Split(var) ⇒ [ "ab" ]
var = "a b" : Split(var) ⇒ [ "a", "b" ]
For extracting individual characters from a string use the Mid function:
l_replace = l_wort
For i=1 To Len(l_letter)
l_replace = Replace(l_replace, Mid(l_letter, i, 1), l_zahl)
Next
A better approach than doing multiple replacements in a loop would be using a regular expression replacement:
Set re = New RegExp
re.Pattern = "[" & l_letter & "]"
l_replace = re.Replace(l_wort, l_zahl)
Treat the character input as a list/collection of letters to replace:
Option Explicit
Dim f : f = "ab"
Dim t : t = "1"
Dim w : w = "abby"
WScript.Echo f, t, w
Dim i
For i = 1 To Len(f)
w = Replace(w, Mid(f, i, 1), t)
Next
WScript.Echo f, t, w
output:
cscript 47469843.vbs
ab 1 abby
ab 1 111y
Here is the string I want to match 76c24efd-ec42-492a-92df-c62cfd4540a3. The following regex will match a 36 char length string with alphanumeric characters and '-'.
[a-zA-Z0-9\-]{36}
I am trying to add to this regex, so it matches only when <8 chars> - <4 chars> - <4 chars> - <4 chars> - <12 chars>
The following will match a group of 8 characters followed by a dash, then a group of 4 characters followed by a dash 3 times, and then a group of 12 characters.
^[a-z0-9]{8}-(?:[a-z0-9]{4}-){3}[a-z0-9]{12}$/
Here's a short JavaScript test that shows the results.
$('.test').each(function(row, item) {
var val = $($(item).children()[0]).html()
var result = /^[a-z0-9]{8}-(?:[a-z0-9]{4}-){3}[a-z0-9]{12}$/gi.test(val);
$($(item).children()[1]).html(result.toString())
});
th {
text-align:left;
padding-right:10px;
}
td {
border: 1px solid #ccc;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<table style="border: 1px solid black">
<tr><th>Test</th><th>Result</th><th>Desired Result</th></tr>
<tr class="test"><td>76c24efd-ec42-492a-92df-c62cfd4540a3</td><td></td><td>Good</td></tr>
<tr class="test"><td>76c24efd-ecz42-492a-92df-c62cfd4540a3</td><td></td><td>Bad Length - 2nd segment too long</td></tr>
<tr class="test"><td>76c24efd-ec2-492a-92df-c62cfd4540a31</td><td></td><td>Bad Segment Lengths</td></tr>
<tr class="test"><td>76$24efd-ec42-492a-92df-c62cfd4540a3</td><td></td><td>Bad Char ($)</td></tr>
</table>
Hi I want to use VBA to pull data from weather web site. What I'm trying to do is to get number 6 from this HTML code:
</tr>
<tr>
<td class="indent"><span>Temperatura średnia</span></td>
<td>
<span class="wx-data"><span class="wx-value">6</span><span class="wx-unit"> ° C</span></span>
</td>
<td>
-
</td>
<td> </td>
</tr>
<tr>
<td class="indent"><span>Temperatura maksymalna</span></td>
<td>
<span class="wx-data"><span class="wx-value">7</span><span class="wx-unit"> ° C</span></span>
</td>
<td>
<span class="wx-data"><span class="wx-value">8</span><span class="wx-unit"> ° C</span></span>
</td>
I tried code like this:
Private Sub CommandButton1_Click()
Dim IE As Object
' Create InternetExplorer Object
Set IE = CreateObject("InternetExplorer.Application")
' You can uncoment Next line To see form results
IE.Visible = False
' URL to get data from
IE.Navigate "https://www.wunderground.com/history/airport/EPGD/2016/10/24/DailyHistory.html?req_city=Pruszcz%20Gdanski&req_statename=Polska&reqdb.zip=00000&reqdb.magic=86&reqdb.wmo=12140"
' Statusbar
Application.StatusBar = "Loading, Please wait..."
' Wait while IE loading...
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
Application.StatusBar = "Searching for value. Please wait..."
Dim dd As String
dd = IE.Document.getElementsByClassName("Temperatura średnia")(0).innerText
MsgBox dd
' Show IE
IE.Visible = True
' Clean up
Set IE = Nothing
Application.StatusBar = ""
End Sub
Without any result (the code does nothing). I will appreciate any help.
Here is the example using XHR and RegEx to retrieve all table data from the webpage:
Option Explicit
Sub ExtractDataWunderground()
Dim aResult() As String
Dim sContent As String
Dim i As Long
Dim j As Long
' retrieve html content
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.wunderground.com/history/airport/EPGD/2016/10/24/DailyHistory.html", False
.Send
sContent = .ResponseText
End With
' parse with regex
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
' minor html simplification
.Pattern = "<span[^>]*>|</span>|[\r\n\t]*"
sContent = .Replace(sContent, "")
' match each table row
.Pattern = "<tr><td class=""indent"">(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td></tr>"
With .Execute(sContent)
ReDim aResult(1 To .Count, 1 To 4)
' each row
For i = 1 To .Count
With .Item(i - 1)
' each cell
For j = 1 To 4
aResult(i, j) = DecodeHTMLEntities(.SubMatches(j - 1))
Next
End With
Next
End With
End With
' output result
Cells.Delete
Output Cells(1, 1), aResult
MsgBox "Completed"
End Sub
Function DecodeHTMLEntities(sText As String) As String
Static oHtmlfile As Object
Static oDiv As Object
If oHtmlfile Is Nothing Then
Set oHtmlfile = CreateObject("htmlfile")
oHtmlfile.Open
Set oDiv = oHtmlfile.createElement("div")
End If
oDiv.innerHTML = sText
DecodeHTMLEntities = oDiv.innerText
End Function
Sub Output(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize( _
UBound(aCells, 1) - LBound(aCells, 1) + 1, _
UBound(aCells, 2) - LBound(aCells, 2) + 1 _
)
.NumberFormat = "#"
.Value = aCells
.Columns.AutoFit
End With
End With
End Sub
The output is as follows for me:
To extract the mean temperature only you can get the value from the first match having 0 index, since the mean temperature is in the first row of the table:
Sub ExtractMeanTempWunderground()
Dim sContent As String
' retrieve html content
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.wunderground.com/history/airport/EPGD/2016/10/24/DailyHistory.html", False
.Send
sContent = .ResponseText
End With
' parse with regex
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
' minor html simplification
.Pattern = "<span[^>]*>|</span>|[\r\n\t]*"
sContent = .Replace(sContent, "")
' match each table row
.Pattern = "<tr><td class=""indent"">.*?</td><td>(.*?)</td><td>.*?</td><td>.*?</td></tr>"
With .Execute(sContent)
If .Count = 15 Then
' get the first row value only
MsgBox DecodeHTMLEntities(.Item(0).SubMatches(0))
Else
MsgBox "Data structure inconsistence detected"
End If
End With
End With
End Sub
Function DecodeHTMLEntities(sText As String) As String
Static oHtmlfile As Object
Static oDiv As Object
If oHtmlfile Is Nothing Then
Set oHtmlfile = CreateObject("htmlfile")
oHtmlfile.Open
Set oDiv = oHtmlfile.createElement("div")
End If
oDiv.innerHTML = sText
DecodeHTMLEntities = oDiv.innerText
End Function
Note, such methods will work until the webpage structure is changed.
I need to validade date yyyy-mm-dd on user keyup.
I'm currently at validating yyyy-mm with this Regex
^\d{0,4}$|^\d{4}[-]$|^\d{4}[-](0?[0-9]|1[012])$
JS Fiddle
But I need validade others part of date. Can anyone help me?
Explanation: Vague checking input while typing if it matches the desired format yyyy-mm-dd.
Modified your current regex a bit and added the dd part so it becomes
^\d{0,4}$|^\d{4}-0?$|^\d{4}-(?:0?[1-9]|1[012])(?:-(?:0?[1-9]?|[12]\d|3[01])?)?$
(?: opens a non capture group for alternation
0?[1-9]? optional 1-9 with preceding 0 or zero
[12]\d days 10-29
3[01] days 30 and 31
See the demo at regex101
For dd-mm-yyyy try this variant:
^0?$|^(?:0?[1-9]|[12]\d|3[01])(?:-(?:(?:0$|0?[1-9]|1[012]?)(?:-\d{0,4})?)?)?$
Or for mm-dd-yyyy that one:
^0?$|^(?:0?[1-9]|1[012]?)(?:-(?:(?:0$|0?[1-9]|[12]\d|3[01])(?:-\d{0,4})?)?)?$
This does not actually validate a date (leap years/28-31 days). Just loose checks input while typing, you can probably make it shorter. As follows an example with the yyyy-mm-dd pattern.
$("#date").on("keyup", function()
{
let valid = /^\d{0,4}$|^\d{4}-0?$|^\d{4}-(?:0?[1-9]|1[012])(?:-(?:0?[1-9]?|[12]\d|3[01])?)?$/.test(this.value), input = this.value;
if(!valid) {
this.value = input.substring(0, input.length - 1);
this.style.backgroundColor = '#EEA39C';
}
setTimeout(() => { this.style.backgroundColor = '#88DD85'; }, 700);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<input id="date" style="font-size: 20px" maxlength="10" type="text" />
To validate the full date when typed/submitted see this answer of #PhiLho and rearrange it to the desired format, e.g. for yyyy-mm-dd
function isValidDate(date)
{
var matches = /^(\d{4})-(\d{1,2})-(\d{1,2})$/.exec(date);
if (matches == null) return false;
var y = matches[1];
var m = matches[2] - 1;
var d = matches[3];
var composedDate = new Date(y, m, d);
return composedDate.getDate() == d &&
composedDate.getMonth() == m &&
composedDate.getFullYear() == y;
}
<input type="text" id="date" style="font-size: 17px" value="2016-03-16">
<button onclick="alert(
isValidDate(getElementById('date').value)
);" style="font-size: 17px">check date</button>
This regex is a bit complex, but check the whole Gregorian rule.
regExp = "(((\d{2}(([13579][26])|([2468][480])|(0[48])))|(([13579][26])|([02468][480]))00)-02-29)|(\d{4}-((?:(0[13578]|1[02])-([0-2]\d|3[0-1]))|(?:(0[469]|11)-([0-2]\d|30))|(?:02-([0-1]\d|2[0-8]))))"
var regExp = /^(\d{4})-(\d{2})-(\d{2})$/
regExp.test(value);
How can i remove style attribute from any tag with regex in asp?
from:
<div style="margin-top:10px;">test</div>
to:
<div>test</div>
Set objRegExp = New regexp
objRegExp.Pattern = "/style\s*=\s*(\'|').+(\'|')/i"
objRegExp.IgnoreCase = True
objRegExp.Global = True
Set resp = objRegExp.Execute(strWordHTML)
For Each respItem In resp
strWordHTML= replace(strWordHTML,respItem.Value,"")
Next
Set resp = Nothing
Set objRegExp = Nothing
solved *
(\sstyle=['""][^'""]+?['""])
Not using regex and not tested but something like this should work
str = "<div style=""margin-top:10px;"">test</div>"
start = InStr(str, "style")
first = InStr(start, str, """")
second = InStr(first, str, """")
result = Mid(str, 1, start - 1) + Mid(str, second + 1)
dim result = Regex.Replace(HtmlText, "style[^>]*", "")