XSLT: How to filter content from complex generated HTML pages? - xslt

Here can be found some very good examples how to use XSLT to filter and merge simple HTML pages.
There are a mass of single saved HTML-pages (that has been generated with ASP) like the following example, that should be filtered and merged together into one HTML to generate a book from it.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="../../../../external.html?link=http://www.w3.org/1999/xhtml" >
<head id="Head1"><title>
2021_0623.aspx
</title>
</style></head>
<body>
<div align="center">
<div class="aspNetHidden">
</div>
<table width="95%" id="table1" cellspacing="0" cellpadding="0" border="0" >
<tr>
<td>
</td>
<td width="100%" bgcolor="black" style="padding: 10px;">
<div align="center">
</div>
</td>
</tr>
<tr>
<td>
</td>
<td bgcolor="black" width="100%" height="20px" style="padding-left: 20px; padding-right: 20px; padding-bottom: 10px;">
<div class="align-left">
</div>
</td>
</tr>
<tr>
<td align="right" valign="top" style="padding-right: 10px">
<a href="" /></a><div id="Menu1">
<ul class="level1">
<li>Recent Updates</li>
</ul>
</div><a id="Menu1_SkipLink"></a>
</td>
<td width="100%" valign="top" bgcolor="white" style="padding: 20px;">
<p class="page-title">Library</p>
<p class="page-title-2">Library Text</p>
<div class="nav">
<table class="nav">
<tr class="nav">
<td class="nav-title">Some unneeded navigation</td>
<td class="nav">
</td>
</tr>
</table>
</div>
<p class="copyright">Copyright © 2021</p>
<p class="about"><strong>ABOUT THE CONTENTS.</strong></p>
<p class="text-title">Title of text</p>
<p class="text-date">August 22, 2021</p>
<p>text of interest.</p>
<p>more text of interest.</p>
<p class="separator-left-33"> </p>
<p class="footnote"><a id="_ftn1" href="#_ftnref1" name="_ftn1">[1]</a> a footnote of interest</p>
<p class="footnote"><a id="_ftn2" href="#_ftnref2" name="_ftn1">[2]</a> one more footnote of interest</p>
<div class="nav">
<table class="nav">
</table>
</div>
</td>
</tr>
<tr>
<td>
</td>
<td width="100%" height="45" align="left" valign="top" style="padding-left: 20px; padding-top: 5px;" bgcolor="black">
</td>
</tr>
</table>
</form>
</div>
</body>
</html>
The result should be to filter all contents beginning with the title
<p class="page-title">Library</p>
including the footnotes.
Is this possible with XSLT and maybe to show up the way to do this?
It would be nice to filter the unneeded navigation and maybe class="about" that is always the same.
But this can be done in several steps afterwards.
The expected output should be like this or can be a well formed HTML-page:
<p class="page-title">Library</p>
<p class="page-title-2">Library Text</p>
<p class="copyright">Copyright © 2021</p>
<p class="text-title">Title of text</p>
<p class="text-date">August 22, 2021</p>
<p>text of interest.</p>
<p>more text of interest.</p>
<p class="separator-left-33"> </p>
<p class="footnote"><a id="_ftn1" href="#_ftnref1" name="_ftn1">[1]</a> a footnote of interest</p>
<p class="footnote"><a id="_ftn2" href="#_ftnref2" name="_ftn1">[2]</a> one more footnote of interest</p>

xsltproc seems to have an option to process --html documents instead of XML ones so assuming that option allows you to parse your inputs into HTML without a namespace the XSLT 1 code
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="html" indent="yes" version="5" doctype-system="about:legacy-doctype"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="body">
<xsl:copy>
<xsl:variable name="start-element" select="//p[#class = 'page-title']"/>
<xsl:apply-templates select="$start-element | $start-element/following-sibling::p"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
If the HTML documents end up in that odd namespace your input carries you would have to bind a prefix to that namespace in XSLT 1 and select and match element nodes with qualified names using prefix:local-name e.g. xhtml:body or xhtml:p where the namespace declaration would be xmlns:xhtml="../../../../external.html?link=http://www.w3.org/1999/xhtml".

So here is the basic solution as perl script doing the needed extract:
#!/usr/bin/perl
my $LCount = 0; # Line count
my $ICount = 0; # Line ignore count
my $DCount = 0; # Line done count
my $Line; # actual line
if (#ARGV == 0) { # Kein Paramter -> Beschreibung
print "\n";
print "extract.pl [input-file] [output-file]\n";
print "\n";
exit;
}
if (#ARGV < 1) { die "To less parameter!\n"; }
if (#ARGV > 2) { die "To much parameter!\n"; }
my $InputFile = $ARGV[0];
my $OutputFile = $ARGV[1];
###############################################################################
# Main programm
###############################################################################
open(InFile, $InputFile) or die "Error opening '$InputFile': $!\n";
open(OutFile,"> $OutputFile") or die "Error opening '$OutputFile': $!\n";
while(defined($Line = <InFile>)) {
$LCount ++;
if ($Line =~ /^<p/) {
if ($Line =~ /class=\"about\"/) {
$ICount ++;
} else {
$DCount ++;
print OutFile $Line;
}
} else {
$ICount ++;
}
}
close(InFile) or die "Error closing '$InputFile': $! \n";
close(OutFile) or die "Error closing '$OutputFile': $! \n";
print "\n$LCount lines from $InputFile processed.\n";
print "$DCount lines extracted.\n";
print "$ICount lines ignored.\n\n";
With some lines more much more can be filtered out and the HTML framework is added optional.
But it is still interesting if this can be done similar simple with XSLT ...

In this special case the basic filtering could be done in a shell with a simple grep:
grep "<p" 1.html > out.html
The perl solution is preferred, because more options in the behaviour and filtering can be implemented.

Related

BeautifulSoup My extracted data from HTML table does not print out in the same table format. Can i keep the table format

I have a HTML Test Report file with a list of test cases. Each test case is in a row in an HTML table.
I have managed to get the test cases out from the table for each row.
When i write this out into my email code it does not write it out as a table format like in the HTML. I would like to keep the grid lines for the rows and columns so it displays nicely as a table.
My method to extract the data is:
def extract_testcases_from_report_htmltestrunner():
filename = (r"E:\test_runners project\selenium_regression_test\TestReport\ClearCore_Automated_GUI_Regression_Project_TestReport.html")
html_report_part = open(filename,'r')
soup = BeautifulSoup(html_report_part, "html.parser")
for div in soup.select("#result_table tr div.testcase"):
yield div.text.strip().encode('utf-8'), div.find_next("a").text.strip().encode('utf-8')
When i write it into my email code out the output I get is:
test_000001_login_valid_user
pass
test_000002_select_a_project
pass
test_000003_verify_Lademo_CRM_DataPreview_is_present
pass
test_000004_view_data_preview_Lademo_CRM_and_test_scrollpage
pass
My desired output would be in the following format with the table lines if possible or in the same table format as it is in the HTML:
test_000001_login_valid_user pass
test_000002_select_a_project pass
test_000003_verify_Lademo_CRM_DataPreview_is_present pass
test_000004_view_data_preview_Lademo_CRM_and_test_scrollpage pass
The HTML snippet is:
<div class='heading'>
<h1>Test Report</h1>
<p class='attribute'><strong>Start Time:</strong> 2016-10-27 10:06:59</p>
<p class='attribute'><strong>Duration:</strong> 0:57:01.842000</p>
<p class='attribute'><strong>Status:</strong> Pass 93</p>
<p class='description'>Selenium - ClearCore Regression Project Automated Test</p>
</div>
<p id='show_detail_line'>Show
<a href='javascript:showCase(0)'>Summary</a>
<a href='javascript:showCase(1)'>Failed</a>
<a href='javascript:showCase(2)'>All</a>
</p>
<table id='result_table'>
<colgroup>
<col align='left' />
<col align='right' />
<col align='right' />
<col align='right' />
<col align='right' />
<col align='right' />
</colgroup>
<tr id='header_row'>
<td>Test Group/Test case</td>
<td>Count</td>
<td>Pass</td>
<td>Fail</td>
<td>Error</td>
<td>View</td>
</tr>
<tr class='passClass'>
<td>Regression_TestCase.RegressionProject_TestCase</td>
<td>47</td>
<td>47</td>
<td>0</td>
<td>0</td>
<td>Detail</td>
</tr>
<tr id='pt1.1' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000001_login_valid_user</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.1')" >
pass</a>
<div id='div_pt1.1' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.1').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.1: *** test_login_valid_user ***
test login with a valid user - Passed
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
<tr id='pt1.2' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000002_select_a_project</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.2')" >
pass</a>
<div id='div_pt1.2' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.2').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.2: *** test_login_valid_user ***
test login with a valid user - Passed
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
<tr id='pt1.3' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000057_run_clean_and_match_process</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.3')" >
pass</a>
<div id='div_pt1.3' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.3').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.3: *** test_login_valid_user ***
test login with a valid user - Passed
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
<tr id='pt1.4' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000058_view_all_records_report_CRM_CRM2_ESCR</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.4')" >
pass</a>
<div id='div_pt1.4' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.4').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.4: *** test_login_valid_user ***
test login with a valid user - Passed
*** Test view_all_records_report - CRM, CRM2, ESCR ***
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
<tr id='pt1.5' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000059_view_matches_report_CRM_CRM2_ESCR</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.5')" >
pass</a>
<div id='div_pt1.5' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.5').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.5: *** test_login_valid_user ***
test login with a valid user - Passed
*** Test view_all_records_report - CRM, CRM2, ESCR ***
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
Is it possible?
The words pass is going onto a new line. If i can separate this out into a column or by a few spaces that would be good.
The word pass is in an a tag in the HTML. The following line of code finds this. Could i put a few spaces or in another column when i extract it?:
yield div.text.strip().encode('utf-8'), div.find_next("a").text.strip().encode('utf-8')
My email message code snippet which writes it out is:
msg = MIMEText("\n ClearCore Automated GUI Project Test Report \n " + "\n" +
"".join([' - '.join(seq) for seq in extract_status_from_report_htmltestrunner()]) + "\n\n" +
'\n'.join([elem
for seq in extract_testcases_from_report_htmltestrunner()
for elem in seq]) + "\n" +
"\n Report location = : \\\storage-1\Testing\Selenium_Test_Report_Results\ClearCore\Selenium VM \n" + "\n")
My code to extract the status from the report is:
def extract_status_from_report_htmltestrunner():
filename = (
r"E:\test_runners 2 edit project\selenium_regression_test\TestReport\ClearCore_Automated_GUI_Regression_Project_TestReport.html")
html_report_part = open(filename, 'r')
soup = BeautifulSoup(html_report_part, "html.parser")
div_heading = soup.find('div', {'class': 'heading'})
p_status = div_heading.find('strong', text='Status:').parent
p_status.find(text=True, recursive=False)
print p_status.text
return p_status.text
Thanks, Riaz
from bs4 import BeautifulSoup
soup = BeautifulSoup(text, 'lxml')
trs = soup.find_all(class_='hiddenRow')
for tr in trs:
row1 = tr.find('td').get_text()
row2 = tr.find('a').get_text(strip=True)
print('{:<55}{:>5}'.format(row1, row2))
out:
test_000001_login_valid_user pass
test_000002_select_a_project pass
test_000057_run_clean_and_match_process pass
test_000058_view_all_records_report_CRM_CRM2_ESCR pass
test_000059_view_matches_report_CRM_CRM2_ESCR pass

html open xslt and pass variable

Am looking for some assistance, I've been searching the web for over a week now trying to expand my limited knowledge of searching an xml file to find a specific entry based on a users input from an html form.
I have tried xpath but my javascripting knowledge is limited and i couldn't get this to work.
I have resorted to xsl to style my xml, it works really nicely when i hardcode what i'm looking for, i'd very much like to make this dynamic based on my html from input, however i'm really struggling with the code to get this working, there are also few examples of how to setup the html side of things that i've found.
XSL
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:param name="skuid" />
<xsl:template match="/">
<xsl:apply-templates select="//sku[#id=$skuid]" />
</xsl:template>
<xsl:template match="sku">
<html>
<body>
<h2>Availability:</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Sku Code</th>
<th>Description</th>
<th>Due Date</th>
<th>Due Qty</th>
</tr>
<tr>
<td align="center"><xsl:value-of select="skucode"/></td>
<td align="center"><xsl:value-of select="description"/></td>
<td align="center"><xsl:value-of select="duedate"/></td>
<td align="center"><xsl:value-of select="dueqty"/></td>
</tr>
<tr bgcolor="#9acd32">
<th colspan="2" align="center">Ranged Currrent Cat</th>
<th colspan="2" align="center">Ranged Next Cat</th>
</tr>
<tr>
<td colspan="2" align="center"><xsl:value-of select="currcat"/></td>
<td colspan="2" align="center"><xsl:value-of select="nextcat"/></td>
</tr>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
XML
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="employees2.xsl"?>
<availability>
<sku id="10011">
<skucode>10011</skucode>
<description>4 Gallon Loft Tank Kit</description>
<duedate>07/09/2016</duedate>
<dueqty>10.00</dueqty>
<currcat>Main Cat In Store</currcat>
<nextcat>Main Cat In Store</nextcat>
</sku>
<sku id="10018">
<skucode>10018</skucode>
<description>MATT EMULSION PINK 2/5L</description>
<duedate>09/09/2016</duedate>
<dueqty>100</dueqty>
<currcat>Not Ranged</currcat>
<nextcat>Not Ranged</nextcat>
</sku>
<sku id="12345">
<skucode>12345</skucode>
<description>DeWalt Drill</description>
<duedate>10/09/2016</duedate>
<dueqty>1000</dueqty>
<currcat>Main Cat In Store</currcat>
<nextcat>Main Cat In Store</nextcat>
</sku>
<sku id="98765">
<skucode>98765</skucode>
<description>Wheel Barrow</description>
<duedate>31/09/2016</duedate>
<dueqty>1</dueqty>
<currcat>Not Ranged</currcat>
<nextcat>Not Ranged</nextcat>
</sku>
</availability>
<!DOCTYPE html>
<html>
<body>
SKU: <input type="text" name="SKU" id="input" maxlength="5">
<br />
<input type="submit" value="Submit" onClick="loadXMLDoc()">
<br />
<br />
<div id="results"></div>
<script>
function loadXMLDoc(dname)
{
if (window.XMLHttpRequest)
{
xhttp=new XMLHttpRequest();
}
else
{
xhttp=new ActiveXObject("Microsoft.XMLHTTP");
}
xhttp.open("GET",dname,false);
try {xhttp.responseType="msxml-document"} catch(err) {} // Helping IE
xhttp.send("");
return xhttp;
}
var y = document.getElementById("input").value;
var x=loadXMLDoc("employees.xml");
var xml=x.responseXML;
path="/Availability/sku[#id=y]";
// code for IE
if (window.ActiveXObject || xhttp.responseType=="msxml-document")
{
xml.setProperty("SelectionLanguage","XPath");
nodes=xml.selectNodes(path);
for (i=0;i<nodes.length;i++)
{
document.write(nodes[i].childNodes[0].nodeValue);
document.write("<br>");
}
}
// code for Chrome, Firefox, Opera, etc.
else if (document.implementation && document.implementation.createDocument)
{
var nodes=xml.evaluate(path, xml, null, XPathResult.ANY_TYPE, null);
var result=nodes.iterateNext();
while (result)
{
document.write(result.childNodes[0].nodeValue);
document.write("<br>");
result=nodes.iterateNext();
}
}
</script>
</body>
</html>
kind regards
Paul
In your path expression
path="/Availability/sku[#id=y]";
"y" means "the value of the element child of the sku element with name "y", it doesn't mean "the value of the Javascript variable named "y".
I forget whether the browser DOM XPath API has a mechanism for substituting parameter values; if not you can use string concatenation to create a path expression containing the value of y as a string literal. But beware injection attacks.

BeautifulSoup extract data from certain columns from a table I am getting too much data out

I am trying to extract some data out of my Selenium Test Report html file.
I am getting too much data out of the table of rows and columns.
The data I would like to extract is all columns which have the class value "testcase" and there is a column below with a class value "popup_link" and the text value will say Pass or Fail.
E.g.
<td class='none'><div class='testcase'>test_000001_login_valid_user</div></td>
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.1')" >
pass</a>
I would like the text "test_000001_login_valid_user" and the text "pass"
There are lots of test cases in my report so I would like to iterate over the rows and get the test case name out and the pass or fail text.
My HTML snippet is:
<table id='result_table'>
<colgroup>
<col align='left' />
<col align='right' />
<col align='right' />
<col align='right' />
<col align='right' />
<col align='right' />
</colgroup>
<tr id='header_row'>
<td>Test Group/Test case</td>
<td>Count</td>
<td>Pass</td>
<td>Fail</td>
<td>Error</td>
<td>View</td>
</tr>
<tr class='passClass'>
<td>Regression_TestCase.RegressionProjectEdit_TestCase.RegressionProject_TestCase_Project_Edit</td>
<td>75</td>
<td>75</td>
<td>0</td>
<td>0</td>
<td>Detail</td>
</tr>
<tr id='pt1.1' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000001_login_valid_user</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.1')" >
pass</a>
<div id='div_pt1.1' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.1').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.1: *** test_login_valid_user ***
test login with a valid user - Passed
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
<tr id='pt1.2' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000002_select_a_project</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.2')" >
pass</a>
<div id='div_pt1.2' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.2').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.2: *** test_login_valid_user ***
test login with a valid user - Passed
*** test_select_a_project ***
08_12_1612_08_03
Selenium_Regression_Edit_Project_Test
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
<tr id='pt1.3' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000003_verify_Lademo_CRM_DataPreview_is_present</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.3')" >
pass</a>
<div id='div_pt1.3' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.3').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.3: *** test_login_valid_user ***
test login with a valid user - Passed
*** test_select_a_project ***
08_12_1612_08_03
Selenium_Regression_Edit_Project_Test
*** Test verify_Lademo_CRM_DataPreview_is_present ***
aSelenium_LADEMO_CRM_DONOTCHANGE
File
498
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
My code is:
from bs4 import BeautifulSoup
table = soup.select_one("#result_table")
for row in table.select("tr.hiddenRow"):
print(" ".join([td.text for td in row.find_all("td")]))
How can i achieve this please?
Thanks, Riaz
Check each row for both, if both exist then extract the text:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
for row in soup.select("#result_table tr"):
div, a = row.select_one("div.testcase"), row.select_one("a.popup_link")
if div and a:
print(div.text.strip(), a.text.strip())
which gives you:
(u'test_000001_login_valid_user', u'pass')
(u'test_000002_select_a_project', u'pass')
(u'test_000003_verify_Lademo_CRM_DataPreview_is_present', u'pass')
Of course if they always go together we can simplify to:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
for div in soup.select("#result_table tr div.testcase"):
print(div.text.strip(), div.find_next("a", class_="popup_link").text.strip())

Nokogiri parsing with xpath returns empty string

I have the following HTML:
<div>
<table>
<tr>
<td>
<div class="w135">
<div style="float: left; padding-right: 10px;" class="imageThumbnail playerDiv">
<a href="/sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_lnkImage10" target="_parent">
<img src="/mritems/imagecache/89/135/mritems/images/2014/10/1/2014101114447491734_20.jpg" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_imgSmall10" border="0" class="imageThumbnail">
</a>
</div>
</div>
</td>
</tr>
</table>
</div>
When i attempt the rake, i get the error:
NoMethodError: undefined method `at_css' for ["id","ctl00_cphBody_ctl01_DataList1_ctl00_Thumbnail1_Layout17"]:Array
This is the code:
#request = HTTParty.get(url)
#html = Nokogiri::HTML(#request.body)
#html.css(".w135")[0].map do |item|
url = item.at_css("div.playerDiv a")
puts url.inspect
end
I'm really not sure what the issue is and have been trying to fix this for a while. The error occurs on this line url = item.at_css("div.playerDiv a")
Any suggestion is appreciated!
Thanks
I'd do it using something like:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<div>
<table>
<tr>
<td>
<div class="w135">
<div style="float: left; padding-right: 10px;" class="imageThumbnail playerDiv">
<a href="/sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_lnkImage10" target="_parent">
<img src="/mritems/imagecache/89/135/mritems/images/2014/10/1/2014101114447491734_20.jpg" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_imgSmall10" border="0" class="imageThumbnail">
</a>
</div>
</div>
</td>
</tr>
</table>
</div>
EOT
puts doc.search('.w135 div.playerDiv a').map(&:inspect)
Which outputs:
# >> #<Nokogiri::XML::Element:0x3ff0918b132c name="a" attributes=[#<Nokogiri::XML::Attr:0x3ff0918b1250 name="href" value="/sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html">, #<Nokogiri::XML::Attr:0x3ff0918b123c name="id" value="ctl00_ctl00_DataList1_ctl00_Thumbnail1_lnkImage10">, #<Nokogiri::XML::Attr:0x3ff0918b1228 name="target" value="_parent">] children=[#<Nokogiri::XML::Text:0x3ff0918a5b6c "\n ">, #<Nokogiri::XML::Element:0x3ff0918a5360 name="img" attributes=[#<Nokogiri::XML::Attr:0x3ff0918a4d20 name="src" value="/mritems/imagecache/89/135/mritems/images/2014/10/1/2014101114447491734_20.jpg">, #<Nokogiri::XML::Attr:0x3ff0918a4cbc name="id" value="ctl00_ctl00_DataList1_ctl00_Thumbnail1_imgSmall10">, #<Nokogiri::XML::Attr:0x3ff0918a4b90 name="border" value="0">, #<Nokogiri::XML::Attr:0x3ff0918a4a28 name="class" value="imageThumbnail">]>, #<Nokogiri::XML::Text:0x3ff091871920 "\n ">]>
If you're trying to access the "href" parameter, instead of using inspect, use:
puts doc.search('.w135 div.playerDiv a').map{ |n| n['href'] }
# >> /sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html

XSL If Statement used for when Link is clicked

I have an RSS feed being pulled through XSLT on an .xsl file. I have a "Show" Link that when clicked displays a hidden DIV with an Iframe that has the source of the unique RSS Item's full page.
The issue is since this DIV is hidden it actually loads all the iframe's source pages when the page is first viewed and boggs down the loading time considerably.
What i want to do is only have the iframe load the source after the "Show" button is clicked. How can i invoke this with an XSLT If Statement?
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:user="urn:my-extension-lib:date">
<xsl:template match="/">
<xsl:for-each select="rss/channel/item">
<div style="margin-bottom: 30px;">
<div style="margin: 5px;">
<div style="font-weight: bold;">
<a href="{link}" target="_blank" style="font-size: 10pt;">
<xsl:value-of select="title" />
</a>
</div>
<div>
<xsl:value-of select="user:GetFormattedDate(pubDate,'MMM d, yyyy hh:mm tt')" />
</div>
</div>
<div style="padding-left:30px">
Show
</div>
<div style="margin: 20px 20px 20px 40px;display:none" id="{guid}">
<iframe width="685" height="400" scrolling="yes" frameborder="yes" src="{link}"></iframe>
</div>
</div>
</xsl:for-each>
</xsl:template>
What I think you need to do is initially load a 'blank' page for each IFRAME. For example, a page called blank.htm, that is empty. Also, you may want to give each IFRAME an ID tag, so you can easily access it with Javascript to change the source
<iframe id="iframe{guid}" width="685" height="400" scrolling="yes" frameborder="yes" src="blank.htm"/>
Then, you can code your javascript like so, to show the DIV, and change the source of the IFRAME to the correct page.
function test(id, link)
{
document.getElementById(id).style.display = 'block';
document.getElementById("iframe" + id).src = link;
}
Here's an example of the whole stylesheet for you
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head>
<title>Test</title>
<script>
function test(id, link)
{
document.getElementById(id).style.display = 'block';
document.getElementById("iframe" + id).src = link;
}
</script>
</head>
<body>
<xsl:for-each select="rss/channel/item">
<div style="margin-bottom: 30px;">
<div style="margin: 5px;">
<div style="font-weight: bold;">
<a href="{link}" target="_blank" style="font-size: 10pt;">
<xsl:value-of select="title"/>
</a>
</div>
<div>
<xsl:value-of select="pubDate"/>
</div>
</div>
<div style="padding-left:30px">
Show
</div>
<div style="margin: 20px 20px 20px 40px;display:none" id="{guid}">
<iframe id="iframe{guid}" width="685" height="400" scrolling="yes" frameborder="yes" src="blank.htm"/>
</div>
</div>
</xsl:for-each>
</body>
</html>
</xsl:template>
</xsl:stylesheet>