Extract Value from HTML - html-entities

I had another question based on a different html format. Problem is that my smartmeter is extremely slow in loading these HTMLs (5secs per file). Now I found another way with different html file....which is very fast in loading...
Can you please give me a helping hand in how I could extract the two values for positive electric power Pplus and negative electric power Pminus...in this html file....
HTML is quite large but has this specific code arround the two values 21.0000 for Pplus and 5.0000 for Pminus.
...
...
</TD></TR></TABLE>
</DIV>
<!-- -->
<!-- ************************** -->
<!-- *** 2. row *************** -->
<!-- -->
<DIV ID="idButtonDiv" STYLE="top:143px; left:0px;" ALIGN="CENTER">
<TABLE CELLSPACING="0" CELLPADDING="0" BORDER="0"><TR><TD ID="idButtonTd">
21.000
</TD></TR></TABLE>
</DIV>
<!-- -->
....
....
<!-- -->
<DIV ID="idButtonDiv" style="top:208px; left:732px;" ALIGN="CENTER" >
<TABLE CELLSPACING="0" CELLPADDING="0" BORDER="0"><TR><TD ID="idButtonTd">
</TD></TR></TABLE>
</DIV>
<!-- -->
<!-- ************************** -->
<!-- *** 4. row *************** -->
<!-- -->
<DIV ID="idButtonDiv" STYLE="top:273px; left:0px;" ALIGN="CENTER">
<TABLE CELLSPACING="0" CELLPADDING="0" BORDER="0"><TR><TD ID="idButtonTd">
5.00000
</TD></TR></TABLE>
</DIV>
<!-- -->
Thanks a lot.
Norbert

Related

XSLT: How to filter content from complex generated HTML pages?

Here can be found some very good examples how to use XSLT to filter and merge simple HTML pages.
There are a mass of single saved HTML-pages (that has been generated with ASP) like the following example, that should be filtered and merged together into one HTML to generate a book from it.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="../../../../external.html?link=http://www.w3.org/1999/xhtml" >
<head id="Head1"><title>
2021_0623.aspx
</title>
</style></head>
<body>
<div align="center">
<div class="aspNetHidden">
</div>
<table width="95%" id="table1" cellspacing="0" cellpadding="0" border="0" >
<tr>
<td>
</td>
<td width="100%" bgcolor="black" style="padding: 10px;">
<div align="center">
</div>
</td>
</tr>
<tr>
<td>
</td>
<td bgcolor="black" width="100%" height="20px" style="padding-left: 20px; padding-right: 20px; padding-bottom: 10px;">
<div class="align-left">
</div>
</td>
</tr>
<tr>
<td align="right" valign="top" style="padding-right: 10px">
<a href="" /></a><div id="Menu1">
<ul class="level1">
<li>Recent Updates</li>
</ul>
</div><a id="Menu1_SkipLink"></a>
</td>
<td width="100%" valign="top" bgcolor="white" style="padding: 20px;">
<p class="page-title">Library</p>
<p class="page-title-2">Library Text</p>
<div class="nav">
<table class="nav">
<tr class="nav">
<td class="nav-title">Some unneeded navigation</td>
<td class="nav">
</td>
</tr>
</table>
</div>
<p class="copyright">Copyright © 2021</p>
<p class="about"><strong>ABOUT THE CONTENTS.</strong></p>
<p class="text-title">Title of text</p>
<p class="text-date">August 22, 2021</p>
<p>text of interest.</p>
<p>more text of interest.</p>
<p class="separator-left-33"> </p>
<p class="footnote"><a id="_ftn1" href="#_ftnref1" name="_ftn1">[1]</a> a footnote of interest</p>
<p class="footnote"><a id="_ftn2" href="#_ftnref2" name="_ftn1">[2]</a> one more footnote of interest</p>
<div class="nav">
<table class="nav">
</table>
</div>
</td>
</tr>
<tr>
<td>
</td>
<td width="100%" height="45" align="left" valign="top" style="padding-left: 20px; padding-top: 5px;" bgcolor="black">
</td>
</tr>
</table>
</form>
</div>
</body>
</html>
The result should be to filter all contents beginning with the title
<p class="page-title">Library</p>
including the footnotes.
Is this possible with XSLT and maybe to show up the way to do this?
It would be nice to filter the unneeded navigation and maybe class="about" that is always the same.
But this can be done in several steps afterwards.
The expected output should be like this or can be a well formed HTML-page:
<p class="page-title">Library</p>
<p class="page-title-2">Library Text</p>
<p class="copyright">Copyright © 2021</p>
<p class="text-title">Title of text</p>
<p class="text-date">August 22, 2021</p>
<p>text of interest.</p>
<p>more text of interest.</p>
<p class="separator-left-33"> </p>
<p class="footnote"><a id="_ftn1" href="#_ftnref1" name="_ftn1">[1]</a> a footnote of interest</p>
<p class="footnote"><a id="_ftn2" href="#_ftnref2" name="_ftn1">[2]</a> one more footnote of interest</p>
xsltproc seems to have an option to process --html documents instead of XML ones so assuming that option allows you to parse your inputs into HTML without a namespace the XSLT 1 code
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="html" indent="yes" version="5" doctype-system="about:legacy-doctype"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="body">
<xsl:copy>
<xsl:variable name="start-element" select="//p[#class = 'page-title']"/>
<xsl:apply-templates select="$start-element | $start-element/following-sibling::p"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
If the HTML documents end up in that odd namespace your input carries you would have to bind a prefix to that namespace in XSLT 1 and select and match element nodes with qualified names using prefix:local-name e.g. xhtml:body or xhtml:p where the namespace declaration would be xmlns:xhtml="../../../../external.html?link=http://www.w3.org/1999/xhtml".
So here is the basic solution as perl script doing the needed extract:
#!/usr/bin/perl
my $LCount = 0; # Line count
my $ICount = 0; # Line ignore count
my $DCount = 0; # Line done count
my $Line; # actual line
if (#ARGV == 0) { # Kein Paramter -> Beschreibung
print "\n";
print "extract.pl [input-file] [output-file]\n";
print "\n";
exit;
}
if (#ARGV < 1) { die "To less parameter!\n"; }
if (#ARGV > 2) { die "To much parameter!\n"; }
my $InputFile = $ARGV[0];
my $OutputFile = $ARGV[1];
###############################################################################
# Main programm
###############################################################################
open(InFile, $InputFile) or die "Error opening '$InputFile': $!\n";
open(OutFile,"> $OutputFile") or die "Error opening '$OutputFile': $!\n";
while(defined($Line = <InFile>)) {
$LCount ++;
if ($Line =~ /^<p/) {
if ($Line =~ /class=\"about\"/) {
$ICount ++;
} else {
$DCount ++;
print OutFile $Line;
}
} else {
$ICount ++;
}
}
close(InFile) or die "Error closing '$InputFile': $! \n";
close(OutFile) or die "Error closing '$OutputFile': $! \n";
print "\n$LCount lines from $InputFile processed.\n";
print "$DCount lines extracted.\n";
print "$ICount lines ignored.\n\n";
With some lines more much more can be filtered out and the HTML framework is added optional.
But it is still interesting if this can be done similar simple with XSLT ...
In this special case the basic filtering could be done in a shell with a simple grep:
grep "<p" 1.html > out.html
The perl solution is preferred, because more options in the behaviour and filtering can be implemented.

How do I write an XSLT script to import data from a spreadsheet into an HTML file?

I need to create an XSLT script to import content from a spreadsheet into an HTML file.
My spreadsheet has 3 columns I need to import.
I'll have the spreadsheet data exported to HTML in this format:
<table class="tableizer-table">
<thead>
<tr class="tableizer-firstrow">
<th>CARD ID</th>
<th>PROMPT</th>
<th>RESPONSE</th>
</tr>
</thead>
<tbody>
<tr>
<td>3</td>
<td>Acquisition</td>
<td>In classical conditioning, the process of taking advantage of reflexive
responses to turn a neutral stimulus into a conditioned stimulus.</td>
</tr>
</tbody>
</table>
That data needs to be added to this HTML page:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" designation="" enumeration="" data-uuid="61dac34c7de54289a58698d5bfc8e776">
<head>
<meta charset="utf-8"/>
<link type="text/css" rel="stylesheet" title="default" href="../../assets/css/main.css"/>
<title>usurpmcatfc0001</title>
</head>
<body>
<section property="ktp:document" typeof="ktp:Document" class="ktp-document">
<section class="ktp-document-meta">
<section property="ktp:metadata" class="ktp-meta"><span property="atom:content-item-name" class="ktp-meta" data-value="usurpmcatfc0001"></span></section>
<section property="ktp:tags" class="ktp-meta"><span property="ktp:topic" class="ktp-meta">MCAT</span><span property="ktp:topic" class="ktp-meta">Behavioral Sciences</span><span property="ktp:subsubtopic" class="ktp-meta">Sensation and Perception</span></section>
</section>
<section property="ktp:document-section" typeof="ktp:flashcards" class="ktp-document-section" data-title="Absolute Threshold">
<p>The minimum of stimulus energy needed to activate a sensory system.</p>
</section>
</section>
</body>
</html>
The data from column 1 needs to feed into the data-value of this section:
<section property="ktp:metadata" class="ktp-meta"><span property="atom:content-item-name" class="ktp-meta" data-value="[COLUMN 1 DATA]"></span></section>
The data from column 2 needs to feed into the data-title and the data from column 3 needs to feed in between the <p></p> tags of this section:
<section property="ktp:document-section" typeof="ktp:flashcards" class="ktp-document-section" data-title="Absolute Threshold">
<p>The minimum of stimulus energy needed to activate a sensory system.</p>
</section>
Any help generating this script would be most appreciated.
Thanks!
If the exported HTML is parseable as XML you can read it simply with the doc function e.g. <xsl:variable name="excel-table" select="doc('exported-table.xml')//table[#class = 'tableizer-table']"/> and of course reading in a column is as easy as e.g. $excel-table/tbody/tr/td[1].
So set up templates for the nodes you want to manipulate e.g.
<xsl:template match="section[#property = 'ktp:metadata']/span/data-value">
<xsl:attribute name="{name()}" select="$excel-table/tbody/tr/td[1]"/>
</xsl:template>
Of course the base processing will be done by the identity transformation e.g. <xsl:mode on-no-match="shallow-copy"/> in XSLT 3.
The only complication seems to be that the main document is XHTML in the namespace xmlns="http://www.w3.org/1999/xhtml" so you need to set up xpath-default-namespace="http://www.w3.org/1999/xhtml" but if the table export is in no namespace the variable needs to use <xsl:variable name="excel-table" xpath-default-namespace="" select="doc('exported-table.xml')//table[#class = 'tableizer-table']"/> and any selection inside the document needs to do the same e.g. the <xsl:attribute name="{name()}" xpath-default-namespace="" select="$excel-table/tbody/tr/td[1]"/>.

Insert grid template row (within grid) under condition

In a facebook-esque fasion, I'm working in a post with comments. The comments have a int which indicates the id of the parent post. So Comment 1 and 2 both have assigned as parent Post 1.
What im working on is on displaying them as a grid within a grid. Here is that part in .zul:
<grid id="postGrid" height="550px" model="#load(vm.pcdata.posts)" emptyMessage="No Posts.">
<template name="model">
<row>
<window border="normal">
<!-- .................. -->
<!-- PARENT POST -->
<!-- .................. -->
<caption id="userPost" label="#load(each.user)"/>
<textbox id="infoPost" readonly="true" value="#load(each.info)" multiline="true" rows="4" width="100%" mold="rounded"/>
<separator bar="true"/>
<hlayout>
<div>
<button label="Like" onClick="#command('addPLike', postid=each.postid)"/>
</div>
<div hflex="true">
<textbox id="likeTB" disabled="true" width="40px" style="text-align:center" value="#load(each.plikes)"/>
</div>
</hlayout>
<separator bar="false"/>
<window border="normal">
<!-- .................. -->
<!-- THE SECOND GRID-->
<!-- .................. -->
<grid id="commentGrid" height="150px" model="#load(vm.pcdata.comments)" emptyMessage="No Comments.">
<template name="model">
<row>
<window border="normal">
<caption id="userComment" label="#load(each.user)"/>
<textbox id="infoComment" readonly="true" value="#load(each.info)" multiline="true" rows="4" width="100%" mold="rounded"/>
<separator bar="true"/>
<hlayout>
<div>
<button label="Like" onClick="#command('addCLike', commentid=each.commentid)"/>
</div>
<div hflex="true">
<textbox id="likeTB" disabled="true" width="40px" style="text-align:center" value="#load(each.clikes)"/>
</div>
</hlayout></window></row></template></grid></window></window></row></template></grid>
In the second grid, I imagine there could be some sort of if function in which if both the postid in the father Post and the postsrc in the child Comment are the same, the comment will be displayed. Is there any way to make this work?
You can use shadow element <if>, e.g.
<if test="#load(vm.yourFlag)">
<grid id="commentGrid">
....
</if>
please see http://books.zkoss.org/zk-mvvm-book/8.0/shadow_elements/flow_control.html
Do you mean commentGrid is created but inner window is hidden, so there is space inside commentGrid, right?
Since you specify emptyMessage on commentGrid, it should show no comments. Or there are still comments but all hidden? If so, you can consider hide both commentGrid with inner window.

How do I get the html that I should send in emails?

I have just started using Zurb Foundation (SASS) to create responsive emails. I followed this tutorial to create a test email. As seen in the tutorial, when viewing the test email in the browser it is responsive and looks beautiful.
The standard boiler plate I use for the test email:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<link rel="stylesheet" type="text/css" href="{{root}}css/app.css">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="viewport" content="width=device-width">
<title>{{subject}}</title>
<!-- <style> -->
</head>
<body>
<span class="preheader">{{description}}</span>
<table class="body">
<tr>
<td class="center" align="center" valign="top">
<center>
{{> body}}
</center>
</td>
</tr>
</table>
<!-- prevent Gmail on iOS font size manipulation -->
<div style="display:none; white-space:nowrap; font:15px courier; line-height:0;"> </div>
</body>
</html>
The body of the test email:
---
layout: index-layout
subject: My Email Templates
---
<container>
<row class="gray collapse">
<columns>
<center><img src="http://unsplash.it/800/200"></center>
</columns>
</row>
<row class="gray">
<columns>
<h2 class="text-center">Responsive columns below</h2>
</columns>
</row>
<row class="gray">
<columns small="12" large="4">
<p>Column 1</p>
</columns small="12" large="4">
<columns>
<p>Column 2</p>
</columns small="12" large="4">
<columns>
<p>Column 3</p>
</columns>
</row>
</container>
This is what the resulting source looks like taken from "view source" in Chrome:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<link rel="stylesheet" type="text/css" href="css/app.css">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="viewport" content="width=device-width">
<title>My Email Templates</title>
<!-- <style> -->
</head>
<body><script id="__bs_script__">//<![CDATA[
document.write("<script async src='/browser-sync/browser-sync-client.js?v=2.23.6'><\/script>".replace("HOST", location.hostname));
//]]></script>
<span class="preheader"></span>
<table class="body">
<tr>
<td class="center" align="center" valign="top">
<center data-parsed="">
<table align="center" class="container float-center"><tbody><tr><td>
<table class="row gray collapse"><tbody><tr>
<th class="small-12 large-12 columns first last"><table><tr><th>
<center data-parsed=""><img src="http://unsplash.it/800/200" align="center" class="float-center"></center>
</th>
<th class="expander"></th></tr></table></th>
</tr></tbody></table>
<table class="row gray"><tbody><tr>
<th class="small-12 large-12 columns first last"><table><tr><th>
<h2 class="text-center">Responsive columns below</h2>
</th>
<th class="expander"></th></tr></table></th>
</tr></tbody></table>
<table class="row gray"><tbody><tr>
<th class="small-12 large-4 columns first"><table><tr><th>
<p>Column 1</p>
</th></tr></table></th>
<th class="small-12 large-4 columns"><table><tr><th>
<p>Column 2</p>
</th></tr></table></th>
<th class="small-12 large-4 columns last"><table><tr><th>
<p>Column 3</p>
</th></tr></table></th>
</tr></tbody></table>
</td></tr></tbody></table>
</center>
</td>
</tr>
</table>
<!-- prevent Gmail on iOS font size manipulation -->
<div style="display:none; white-space:nowrap; font:15px courier; line-height:0;"> </div>
</body>
</html>
I sent the test email to myself using gmail to check it. This was done by copying the source shown above and pasting it using "Insert HTML" through the html extension for gmail in chrome. The result was horrible - no responsiveness and it looked ugly.
How am I supposed to make use of the test email that I have created? Is it even the source from "view source" in Chrome that I should send in an email? Is it even possible to send the test email over gmail, or do I have to use e.g., mailchimp or sendgrid?
Putsmail will be your best bet for quickly testing a single template like this. You can just paste your complied HTML in and add your gmail address (or any others)
The SASS version of Zurb Foundation for Email 2.0 is controlled using terminal emulator. I am assuming you have already installed Zurb using one. The thing you are missing is a few commands.
To start, you use the terminal to navigate to the directory where you installed Zurb. The command is npm start or foundation watch. This will start Zurb running in a mode where you can see your edits before it runs a command to inline the code.
To inline the code, type foundation build or npm run build. This will give you an email which has the necessary css code inline where needed to produce the final email.
Zurb Commands
A few other Zurb commands you might find useful are:
npm install --global foundation-cli - (Install npm)
foundation new --framework emails - (a new installation of zurb)
npm start - (start the application)
foundation watch - (start the application)
foundation build - (run an inliner for the appropriate css)
npm run build - (run an inliner for the appropriate css)
npm cache clean - (clear the cache)
npm update - (install latest updates)
If none of this makes sense, please visit one of the following for better tutorials to hopefully get you up to speed on how Zurb works. It's complicated, but it makes great emails.
More Information
https://foundation.zurb.com/emails/docs/sass-guide.html
https://foundation.zurb.com/emails/docs/css-guide.html
Good luck.

BeautifulSoup extract data from certain columns from a table I am getting too much data out

I am trying to extract some data out of my Selenium Test Report html file.
I am getting too much data out of the table of rows and columns.
The data I would like to extract is all columns which have the class value "testcase" and there is a column below with a class value "popup_link" and the text value will say Pass or Fail.
E.g.
<td class='none'><div class='testcase'>test_000001_login_valid_user</div></td>
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.1')" >
pass</a>
I would like the text "test_000001_login_valid_user" and the text "pass"
There are lots of test cases in my report so I would like to iterate over the rows and get the test case name out and the pass or fail text.
My HTML snippet is:
<table id='result_table'>
<colgroup>
<col align='left' />
<col align='right' />
<col align='right' />
<col align='right' />
<col align='right' />
<col align='right' />
</colgroup>
<tr id='header_row'>
<td>Test Group/Test case</td>
<td>Count</td>
<td>Pass</td>
<td>Fail</td>
<td>Error</td>
<td>View</td>
</tr>
<tr class='passClass'>
<td>Regression_TestCase.RegressionProjectEdit_TestCase.RegressionProject_TestCase_Project_Edit</td>
<td>75</td>
<td>75</td>
<td>0</td>
<td>0</td>
<td>Detail</td>
</tr>
<tr id='pt1.1' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000001_login_valid_user</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.1')" >
pass</a>
<div id='div_pt1.1' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.1').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.1: *** test_login_valid_user ***
test login with a valid user - Passed
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
<tr id='pt1.2' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000002_select_a_project</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.2')" >
pass</a>
<div id='div_pt1.2' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.2').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.2: *** test_login_valid_user ***
test login with a valid user - Passed
*** test_select_a_project ***
08_12_1612_08_03
Selenium_Regression_Edit_Project_Test
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
<tr id='pt1.3' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000003_verify_Lademo_CRM_DataPreview_is_present</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.3')" >
pass</a>
<div id='div_pt1.3' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.3').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.3: *** test_login_valid_user ***
test login with a valid user - Passed
*** test_select_a_project ***
08_12_1612_08_03
Selenium_Regression_Edit_Project_Test
*** Test verify_Lademo_CRM_DataPreview_is_present ***
aSelenium_LADEMO_CRM_DONOTCHANGE
File
498
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
My code is:
from bs4 import BeautifulSoup
table = soup.select_one("#result_table")
for row in table.select("tr.hiddenRow"):
print(" ".join([td.text for td in row.find_all("td")]))
How can i achieve this please?
Thanks, Riaz
Check each row for both, if both exist then extract the text:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
for row in soup.select("#result_table tr"):
div, a = row.select_one("div.testcase"), row.select_one("a.popup_link")
if div and a:
print(div.text.strip(), a.text.strip())
which gives you:
(u'test_000001_login_valid_user', u'pass')
(u'test_000002_select_a_project', u'pass')
(u'test_000003_verify_Lademo_CRM_DataPreview_is_present', u'pass')
Of course if they always go together we can simplify to:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
for div in soup.select("#result_table tr div.testcase"):
print(div.text.strip(), div.find_next("a", class_="popup_link").text.strip())