screen scraping using coldfusion - coldfusion

I am trying to screen scrape another application using the below code in Coldfusion.
<cfhttp url="https://intra.att.com/itscmetrics/EM2/LTMR.cfm" method="get" username="uvwxyz" password="abcdef">
<cfhttpparam type="url" name="LTMX" value="Andre Fuetsch / Shelly K Lazzaro">
</cfhttp>
<cfset myDocument = cfhttp.fileContent>
<cfoutput>
#myDocument#
</cfoutput>
Now when I run my cfm page, iam able to access the desitination page, with the above code.
The destination page looks like below.
A part of the source code of this is as below.
<table border="1" width=99% style="border-collapse:collapse;">
<thead>
<td colspan="12" class="drpmainheader1_2">LTM Detail Report for Andre Fuetsch / Shelly K Lazzaro</td>
<tr align="center">
<th class="ptitles">Liaison Name</th>
<th class="ptitles">Application Acronym</th>
<th class="ptitles">MOTS ID</th>
<th class="ptitles">Priority</th>
<th class="ptitles">MC</th>
<th class="ptitles">DR Exercise</th>
<th class="ptitles">ARM/SRM Maintenance</th>
<th class="ptitles">ARM/SRM Creation</th>
<th class="ptitles">Backup & Recovery Certification</th>
<th class="ptitles">Interface Certification</th>
<th class="ptitles">AIA Compliance</th>
</tr>
</thead>
<tbody>
<tr>
<td class="drpdetailtablerowdetailleft">Lynette M Acosta</td>
<td class="drpdetailtablerowdetailleft">AABA</td>
<td class="drpdetailtablerowdetail">9710</td>
<td class="drpdetailtablerowdetail">5</td>
<td class="drpdetailtablerowdetail">NMC</td>
<td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td>
</tr>
</tbody>
<tbody>
<tr>
<td class="drpdetailtablerowdetailleft">Lynette M Acosta</td>
<td class="drpdetailtablerowdetailleft">ABS RECON+</td>
<td class="drpdetailtablerowdetail">13999</td>
<td class="drpdetailtablerowdetail">3</td>
<td class="drpdetailtablerowdetail">NMC</td>
<td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td>
</tr>
</tbody>
I am not good with regex in coldfusion, Can anyone please guide me or give me any starting points as to how to extract the data from the html table using Coldfusion? I do not have access to the DB. Hope this is clear.

Parsing HTML using regex? You'll have more options if you use the jsoup HTML Parser w/ColdFusion. Jsoup uses jQuery-like DOM selectors and can quickly convert the HTML table data into arrays.
http://jsoup.org/
Here are some related articles & sample code:
http://www.raymondcamden.com/index.cfm/2012/4/6/jsoup-adds-jQuerylike-parsing-in-Java
http://www.bennadel.com/blog/2358-Parsing-Traversing-And-Mutating-HTML-With-ColdFusion-And-jSoup.htm
http://pastebin.com/U6A86mSi

Related

Graphviz/ Table/ How to merge cells

I would like to draw a graph like this -
I have Graphviz code like this -
digraph G {
"test" [
label = <<table border="0" cellspacing="0">
<tr>
<td port="f0" border="1" bgcolor="darkorange">TEST</td>
<td port="f1" border="1" bgcolor="darkorange"></td>
</tr>
<tr>
<td port="f2" border="1" bgcolor="cyan">A</td>
<td>
<table border="0" cellspacing="0">
<tr><td port="f3" border="1" bgcolor="azure">A1</td></tr>
<tr><td port="f4" border="1" bgcolor="azure">A2</td></tr>
<tr><td port="f5" border="1" bgcolor="azure">A3</td></tr>
</table>
</td>
</tr>
<tr>
<td port="f5" border="1" bgcolor="gray">Else</td>
<td port="f6" border="1" bgcolor="gray"></td>
</tr>
</table>>
shape = "none"
];
}
But it gives the graph like this
Would you please suggest how can we tweak the code to achieve the objective - merging f0, f1 on top and f5,f6 at bottom?
You can use HTML <td>s with colspan and rowspan attributes in GraphViz. These allow one cell to span multiple columns and/or rows inside a table.
This also simplifies your digraph, as only one table is needed.
digraph G {
"test" [
label = <<table border="0" cellspacing="0">
<tr>
<td colspan="2" port="f0" border="1" bgcolor="darkorange">TEST</td>
</tr>
<tr>
<td rowspan="3" port="f5" border="1" bgcolor="blue">A</td>
<td port="f6" border="1" bgcolor="white">A1</td>
</tr>
<tr>
<td port="f6" border="1" bgcolor="white">A2</td>
</tr>
<tr>
<td port="f6" border="1" bgcolor="white">A3</td>
</tr>
<tr>
<td colspan="2" port="f0" border="1" bgcolor="grey">Else</td>
</tr>
</table>>
shape = "none"
];
}
This gives you the following basic output, which you can then customize for spacing, line colors, etc:
This one also works. what's is the difference?
digraph G {
"test" [
label = <<table border="0" cellspacing="0">
<tr><td colspan="2" port="f0" border="1" bgcolor="darkorange">TEST</td> </tr>
<tr><td rowspan="4" port="f5" border="1" bgcolor="blue">A</td></tr>
<tr><td port="f6" border="1" bgcolor="white">A1</td></tr>
<tr><td port="f6" border="1" bgcolor="white">A2</td></tr>
<tr><td port="f6" border="1" bgcolor="white">A3</td></tr>
<tr><td colspan="2" port="f0" border="1" bgcolor="grey">Else</td></tr>
</table>>
shape = "none"
];
}

How to transpose into the "same" column?

I have a table like
<table class="tg">
<tr>
<th class="tg-0lax">date</th>
<th class="tg-0lax">organic</th>
<th class="tg-0lax">referrer</th>
<th class="tg-0lax">direct</th>
</tr>
<tr>
<td class="tg-0lax">01.01.2019</td>
<td class="tg-0lax">12345</td>
<td class="tg-0lax">123</td>
<td class="tg-0lax">23</td>
</tr>
<tr>
<td class="tg-0lax">25.01.2019</td>
<td class="tg-0lax">23456</td>
<td class="tg-0lax">234</td>
<td class="tg-0lax">34</td>
</tr>
<tr>
<td class="tg-0lax">03.03.2019</td>
<td class="tg-0lax">34567</td>
<td class="tg-0lax">345</td>
<td class="tg-0lax">56</td>
</tr>
<tr>
<td class="tg-0lax">15.04.2019</td>
<td class="tg-0lax">45678</td>
<td class="tg-0lax">456</td>
<td class="tg-0lax">78</td>
</tr>
</table>
I want to get the data into this view, where all data are placed into the same three columns, not dependently of whether points in the first column are repeated:
<table class="tg">
<tr>
<th class="tg-0lax">type</th>
<th class="tg-0lax">source</th>
<th class="tg-0lax">date</th>
</tr>
<tr>
<td class="tg-0lax">organic</td>
<td class="tg-0lax">12345</td>
<td class="tg-0lax">01.01.2019</td>
</tr>
<tr>
<td class="tg-0lax">referrer</td>
<td class="tg-0lax">123</td>
<td class="tg-0lax">01.01.2019</td>
</tr>
<tr>
<td class="tg-0lax">direct</td>
<td class="tg-0lax">23</td>
<td class="tg-0lax">01.01.2019</td>
</tr>
<tr>
<td class="tg-0lax">organic</td>
<td class="tg-0lax">23456</td>
<td class="tg-0lax">25.01.2019</td>
</tr>
<tr>
<td class="tg-0lax">referrer</td>
<td class="tg-0lax">234</td>
<td class="tg-0lax">25.01.2019</td>
</tr>
<tr>
<td class="tg-0lax">direct</td>
<td class="tg-0lax">34</td>
<td class="tg-0lax">25.01.2019</td>
</tr>
<tr>
<td class="tg-0lax">organic</td>
<td class="tg-0lax">34567</td>
<td class="tg-0lax">03.03.2019</td>
</tr>
<tr>
<td class="tg-0lax">referrer</td>
<td class="tg-0lax">345</td>
<td class="tg-0lax">03.03.2019</td>
</tr>
<tr>
<td class="tg-0lax">direct</td>
<td class="tg-0lax">56</td>
<td class="tg-0lax">03.03.2019</td>
</tr>
<tr>
<td class="tg-0lax">organic</td>
<td class="tg-0lax">45678</td>
<td class="tg-0lax">15.04.2019</td>
</tr>
<tr>
<td class="tg-0lax">referrer</td>
<td class="tg-0lax">456</td>
<td class="tg-0lax">15.04.2019</td>
</tr>
<tr>
<td class="tg-0lax">direct</td>
<td class="tg-0lax">78</td>
<td class="tg-0lax">15.04.2019</td>
</tr>
</table>
The "normal" transposing is pretty close to what I want, but even not exactly this, and I miss the point, how to pivot the data.
Another one example:
Got an error:
What I'm doing wrong? The formula is:
=ARRAYFORMULA({"type","source","date";SPLIT(TRANSPOSE(SPLIT(CONCATENATE(IF(B2:D<>"","♠"&B1:D1&"♦"&B2:D&"♦"&A2:A, )),"♠")),"♦")})
The line breaks are from the formula away - I've deleted them. Could the error cause be that my Google Spreadsheets used in Germany - formula language issue?
=ARRAYFORMULA({"type", "source", "date";
SPLIT(TRANSPOSE(SPLIT(CONCATENATE(IF(B2:D<>"",
"♠"&B1:D1&"♦"&B2:D&"♦"&A2:A, )), "♠")), "♦")})

DAX - filtering out whole row from data table that meets certain condition(s)

I have a table like following:
<style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-baqh{text-align:center;vertical-align:top}
</style>
<table class="tg">
<tr>
<th class="tg-baqh">Store Name</th>
<th class="tg-baqh">Week Number</th>
<th class="tg-baqh">Sales</th>
<th class="tg-baqh">Sales-LY</th>
</tr>
<tr>
<td class="tg-baqh">Store A</td>
<td class="tg-baqh">1</td>
<td class="tg-baqh">20</td>
<td class="tg-baqh">15</td>
</tr>
<tr>
<td class="tg-baqh">Store A</td>
<td class="tg-baqh">2</td>
<td class="tg-baqh">25</td>
<td class="tg-baqh">20</td>
</tr>
<tr>
<td class="tg-baqh">Store A</td>
<td class="tg-baqh">3</td>
<td class="tg-baqh">30</td>
<td class="tg-baqh">25</td>
</tr>
<tr>
<td class="tg-baqh">Store B</td>
<td class="tg-baqh">1</td>
<td class="tg-baqh">15</td>
<td class="tg-baqh">10</td>
</tr>
<tr>
<td class="tg-baqh">Store B</td>
<td class="tg-baqh">2</td>
<td class="tg-baqh">15</td>
<td class="tg-baqh">15</td>
</tr>
<tr>
<td class="tg-baqh">Store B</td>
<td class="tg-baqh">3</td>
<td class="tg-baqh">20</td>
<td class="tg-baqh">15</td>
</tr>
<tr>
<td class="tg-baqh">Store C</td>
<td class="tg-baqh">1</td>
<td class="tg-baqh">30</td>
<td class="tg-baqh">25</td>
</tr>
<tr>
<td class="tg-baqh">Store C</td>
<td class="tg-baqh">2</td>
<td class="tg-baqh">0</td>
<td class="tg-baqh">20</td>
</tr>
<tr>
<td class="tg-baqh">Store C</td>
<td class="tg-baqh">3</td>
<td class="tg-baqh">25</td>
<td class="tg-baqh">20</td>
</tr>
</table>
I would like to return (lets' say) a pivot table with
Salex IDX = SUM(Sales)/SUM(Sales-LY) as a measure, ignoring the data points for "Week 2 for Store C".
So it's not a filter on just Week Number or Store, but its a filter on specific row(s) identified by multiple parameters.
Essentially, i would like to get 'like for like' results, excluding any weeks where Sales or Sales-LY columns are zero(or null)
Any ideas?
I would add a calculated column to concatenate Store and Week, something like this:
=[Store Name]&", Week: "&[Week Number]
Then you can use a filter or slicer to exclude the Store & Week combination you want.

Code after CFInclude seems to disappear or is not rendered

Having some issues with a ColdFusion application here. I'm trying to add in a <cfinclude template="header.cfm"/> and it renders correctly however the rest of the cf code seems to disappear, not sure if its not being rendered or just not showing up because of the cfinclude statement running. This is for a page header I'm trying to insert.
Is there a way to insert the cfincludes and have it stop so the rest of the page can process? Does my question make sense?
<table width="600" border="0" align="center" cellpadding="0" cellspacing="0">
<!-- fwtable fwsrc="header.png" fwbase="default.gif" fwstyle="Dreamweaver" fwdocid = "742308039" fwnested="1" -->
<tr>
<td><img name="grantpro" src="images/grantpro.gif" width="411" height="80" border="0" alt=""></td>
<td><img name="gpimage" src="images/gpimage.jpg" width="189" height="80" border="0" alt=""></td>
</tr>
<tr>
<td colspan="2" align="center">
<table width="599px" border="0" align="center" cellpadding="1" cellspacing="1">
<tr>
<td colspan="4"><div align="center"><font size="5"><strong>FDC Menu</strong></font></div></td>
</tr>
<td colspan="3"><strong>FDC Pending Proposals:</strong></td>
</tr>
<tr>
<td> </td>
<td colspan="2">By Applicant Name</td>
</tr>
<tr>
<td> </td>
<td colspan="2">By Grant Type</td>
</tr>
<tr>
<td> </td>
<td colspan="2"> </td>
</tr>
<tr>
<td colspan="3"><strong>FDC Funded Proposals:</strong></td>
</tr>
<tr>
<td> </td>
<td colspan="2"><strong><em>Current Year</em></strong></td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>By Applicant Name</td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>By Grant Type</td>
</tr>
<tr>
<td> </td>
<td colspan="2"><em><strong>Prior Years</strong></em></td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>By Applicant Name </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>By Grant Type</td>
<cfinclude template="cssmenu/header.cfm"/>
</table>
<p align="center"><strong>Logout</strong></p> </td>
The following code shows where the problem is
<tr>
<td> </td>
<td> </td>
<td>By Grant Type</td>
<cfinclude template="cssmenu/header.cfm"/>
</table>
Solution 1:
This is the recommended solution
The <cfinclude> probably should be moved outside of the </table>
Solution 2:
cssmenu/header.cfm would need to finish the current table row and start an new one. This is not recommended. It is not modular at all.
</tr>
<tr>
<td colspan="3">
... Content goes here ...
</td>
</tr>
You are missing a </tr> before the <cfinclude>. Also it seems like an odd place to include a header, rather add another table row and td and include the header inside of the <td> not in between the table code as this is causing it to break.

help with regular expression pattern to extract some text from html in C#

I have this html block:
<tr>
<th colspan="2" valign="middle">some text</th>
</tr>
<tr>
<td class="row1">lalala<span>dadada</span></td>
<td class="row2"><input name="unwantedinput"></td>
</tr>
<th colspan="2" valign="middle">some text</th>
</tr>
<tr>
<td class="row1">nanana<span>bababa</span></td>
<td class="row2"><input name="unwantedinput"></td>
</tr>
<tr>
<th colspan="2" valign="middle">Some other text</th>
</tr>
<tr>
<td class="row1">(this text needs to be extracted)</td>
<td class="row2"><input name="myUniqueInput"></td>
</tr>
<tr>
<th colspan="2" valign="middle">some text</th>
</tr>
<tr>
<td class="row1">lalala<span>dadada</span></td>
<td class="row2"><input name="unwantedinput"></td>
</tr>
what I need is to extract only the data between the "(this text needs to be extracted)".. here is what I've done so far:
<th[^>]*>(.*?)<input[^>]*name="myUniqueInput"[^>]*>
the problem with this pattern. its matching the whole text from the beginning till the "myUniqueInput"..
any idea how to fix this?
thanks in advance..
/<td[^>]*>([^<]*)<[^>]*>\s*<td[^>]*>\s*<input[^>]*name="myUniqueInput"/
You can always match more/less depending if you know how the html will look. The idea is to skip td* before the input name. Then get everything between the previous td /td.
It's generally accepted that regular expressions aren't expressive enough to parse HTML correctly. Have you considered using a library to parse the HTML for you, and then extracting the data from there?