phpQuery: How to parse the tr's class and style clause - phpquery

I have a script parsing a html page. One section parses the various tr elements. Suppose we have a tr structure like this:
<tr class="a test" style="display: none;">
<td class="numeric"></td>
<td class="date">1/1/16</td>
<td class="numeric">$2.65</td>
.... etc....
</tr>
<tr class="another tr">
...... etc......
The current code which is working fine is something like this:
$the_row_I_want = $parseSectionofCodeImlookingat->find("tr:eq($rowIwant)");
$entire_tr_in_html = $the_row_I_want->__toString();
What this does is return the html for the entire tr, from which I can parse out the class name and style sections. However I have been trying to find out how to use phpQuery syntax to parse the class name and style section out for me. Is there a way to do this using phpQuery (and how)?

OK, figured it out:
$the_row_I_want = $parseSectionofCodeImlookingat->find("tr:eq($rowIwant)");
$entire_tr_in_html = $the_row_I_want->__toString();
$the_class_attribute = $the_row_I_want->attr('class');
$the_style_attribute = $the_row_I_want->attr('style');

Related

RegEx Parsing for HTML attributes - one specific string

With Delphi Rio, I am using an HTML/DOM parser. I am traversing the various nodes, and the parser is returning attributes/tags. Normally these are not a problem, but for some attributes/tag, the string returned includes multiple attributes. I need to parse this string into some type of container, such as a stringlist. The attribute string the parser returns already has the '<' and '> removed.
Some examples of attribute strings are:
data-partnumber="BB3312" class=""
class="cb10"
account_number = "11432" model = "pay_plan"
My end result that I want is a StringList, with one or more name=value pairs.
I have not used RegEx to any real degree, but I think that I want to use RegEx. Would this be a valid approach? For a RegEx pattern, I think the pattern I want is
\w\s?=\s?\"[^"]+"
To identify multiple matches within a string, I would use TRegex.Matches. Am I overlooking something here that will cause me issues later on?
*** ADDITIONAL INFO ***
Several people have suggested to use a decent parser. I am currently using the openSource HTML/DOM parser found here: https://github.com/sandbil/HTML-Parser
In light of that, I am posting more info... here is an HTML Snippet I am parsing. Look at the line I have added *** at the end. My parser is returning this as
Node.AttributeText= 'data-partnumber="B92024" data-model="pay_as_you_go" class="" '
Would a different HTML DOM parser return this as 3 different elements/attributes? If so, can someone recommend a parser?
<section class="cc02 cc02v0" data-trackas="cc02" data-ocomid="cc02">
<div class="cc02w1">
<div class="otable otable-scrolling">
<div class="otable-w1">
<table class="otable-w2">
<thead>
<tr>
<th>Product</th>
<th>Unit Price</th>
<th>Metric</th>
</tr>
</thead>
<tbody>
<tr>
<td class="cb152title"><div>MySQL Database for HeatWave-Standard-E3</div></td>
<td><div data-partnumber="B92024" data-model="pay_as_you_go" class="">$0.3536<span></span></div></td> *****
<td><div>Node per hour</div></td>
</tr>
<tr data-partnumber="B92426">
<td class="cb152title">MySQL Database—Storage</td>
<td><span data-model="pay_as_you_go" class="">$0.04<span></span></span></td>
<td>Gigabyte storage capacity per month</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</section>
The documentation for the parser you are using says TDomTreeNode has an AttributesText property that is a "string with all attributes", which you have shown examples of. But it also has an Attributes property that is "parsed attributes" provided as a TDictionary<string, string>. Have you tried looking into the values of that property yet? You should not need to use a RegEx at all, just enumerate the entries of the TDictionary instead, eg:
var
Attr: TPair<string, string>;
for Attr in Node.Attributes do begin
// use Attr.Key and Attr.Value as needed...
end;
(As the OP asked about using a RegEx to parse attribute=value pairs, this answers the question directly, which other users may be looking for in the future.)
RegEx based answer
Using a RegEx is extremely powerful, from the data you have provided you can extract the attribute name and value pairs using:
(\S+)\s*=\s*(\"?)([^"]*)(\2|\s|$)
This uses grouping and can be explained as follows:
The first result group is the attribute name (it matches non-whitespace characters)
The second result group is an enclosing " if present, otherwise an empty string
The third result group is the value of the attribute
As RegExes can be run recursively you can use MatchAgain to see if there's another match and so read all of the attributes recursively.
procedure ParseAttributes(AInput: String; ATarget: TStringList);
var
LMatched: Boolean;
begin
pRegEx:=TPerlRegEx.Create;
try
pRegEx.RegEx:='(\S+)\s*=\s*(\"?)([^"]*)(\2|\s|$)';
pRegEx.Subject:=AInputData;
LMatched:=pRegEx.Match;
while LMatched do
begin
ATarget.Add(pRegEx.Groups[1].'='+'"'+pRegEx.Groups[3]+'"');
LMatched:=pRegEx.MatchAgain;
end;
finally
pRegEx.Free;
end;
end;
Disclaimer: I haven't tried compiling that code, but hopefully it's enough to get you started!
Practical Point: With respect to the actual problem you posed with your DOM parser - this is a task that there are existing solutions for so a practical answer to solving the problem may well be to use a DOM parser that works! If a RegEx is something you need for whatever reason this one should do the job.

How can I build a complex e-mail - template plus content saved in a variable - using cfscript instead of cfmail?

This is expanding a bit on a question I asked earlier. Server is CF2016. I'm saving a table of data using savecontent:
savecontent variable = 'mailBody' {
writeOutput('
<table width="99%" style="border-collapse:collapse;width:99%;">
<tr>
<td style="background-color:##09AFFF;color:##FFFFFF;width:30%;padding-left:3px;padding-top:5px;padding-bottom:5px;font-size:12px;font-weight:700;border-bottom:1px solid ##5B5B5B;text-align:left;">Name</td>
<td style="background-color:##09AFFF;color:##FFFFFF;width:15%;padding-top:5px;padding-bottom:5px;font-size:12px;font-weight:700;border-bottom:1px solid ##5B5B5B;text-align:center;">Class</td>
<td style="background-color:##09AFFF;color:##FFFFFF;width:30%;padding-top:5px;padding-bottom:5px;font-size:12px;text-align:left;font-weight:700;border-bottom:1px solid ##5B5B5B;">City,State,ZIP</td>
<td style="background-color:##09AFFF;color:##FFFFFF;width:15%;padding-right:5px;padding-top:5px;padding-bottom:5px;font-size:12px;text-align:left;font-weight:700;border-bottom:1px solid ##5B5B5B;">Amount</td>
</tr>
');
for ( qryPeople in queryPeople ){
writeOutput('
<tr>
<td style="font-size:12px;padding-left:3px;padding-top:3px;padding-bottom:4px;background-color:#thisBgColor#;border-bottom:1px solid ##5B5B5B;">#qryPeople.p_first# #qryPeople.p_last#</td>
<td style="font-size:12px;padding-left:3px;padding-top:3px;padding-bottom:4px;background-color:#thisBgColor#;border-bottom:1px solid ##5B5B5B;text-align:center;">#YEAR(qryPeople.p_graduation)#</td>
<td style="font-size:12px;padding-left:3px;padding-top:3px;padding-bottom:4px;background-color:#thisBgColor#;border-bottom:1px solid ##5B5B5B;">#qryPeople.p_city# #qryPeople.p_state#</td>
<td style="font-size:12px;padding-top:3px;padding-bottom:4px;padding-right:5px;background-color:#thisBgColor#;border-bottom:1px solid ##5B5B5B;">#NumberFormat(qryValue.p_value,'99,999')#</td>
</tr>
');
};
writeOutput('
<tr>
<td colspan="5" style="font-size:11px;padding-left:5px;padding-top:5px;padding-right:5px;padding-bottom:7px;background-color:##09AFFF;color:##FFFFFF;font-style:italic;border-bottom:1px solid ##5B5B5B;">footer text</td>
</tr>
</table>
');
};//end savecontent
Works fine through here - I can output the variable mailBody and I see a styled table suitable for HTML email.
We have stock email templates that we use (.htm) files that are stored centrally. I'm trying to inject this content into one of these templates to be sent.
mailerService = new mail();
mailTemplate = fileRead(application.paths.physicalroot & '\email\project1\templates\people.htm');
mailerService.setTo("me#domain.com");
mailerService.setFrom("support#domain.com");
mailerService.setSubject("People Report");
mailerService.setType("html");
mailerService.send(body=mailTemplate);
In the .htm template file I have
<cfoutput>#mailBody#</cfoutput>
And it's giving me exactly that - #mailBody#. In less complex e-mails I have no problem using something like
<cfoutput>Welcome #qryPeople.p_first# #qryPeople.p_last#</cfoutput>
Or accessing other variables set on the cfscript template that drives the e-mail. But I can't figure out why my savecontent variable isn't working as expected.
SOLUTION - previously trying a savecontent include did not work, but that may have been on ACF 2010. This works on ACF2016.
mailerService = new mail();
savecontent variable="mailTemplate" {
include variables.templatePath & '\email\project1\templates\people.htm';
};
mailerService.setTo("me#domain.com");
People.htm is included and the other savecontent (mailbody) is rendered in the e-mail. Now to figure it out using the newer cfmail() script...
If you only have one "block" to be evaluated, I'd just replace it using a string function:
mailTemplate = fileRead(application.paths.physicalroot & '\email\project1\templates\people.htm');
mailTemplate = replaceNoCase(mailTemplate, "##mailBody##", mailBody, "one");
// continue with mailerService.* methods
Another option is to use include with a saveContent:
This may require that you rename your template from a *.htm to be *.cfm file.
// create mailBody first using your current saveContent
savecontent variable="finalBody" {
include "#application.paths.physicalroot#\email\project1\templates\people.cfm";
}
The variable finalBody should now contain the content from the mailBody variable.
If you can have CF markup in the templates you should be able to get the results you want with this:
<cfsavecontent variable="mailBody">
<cfinclude template="#application.paths.physicalroot#\email\project1\templates\people.htm">
</cfsavecontent>

Flask - SQLAlchemy italicizing text imported from db

I have a application that I am pulling data from fields in MySQL database. In some of these fields a some of the text needs to be italicized. For example, in species.html I have a table data "species.Notes" that I am pulling out from the MySQL database. That will render something like this on the page when the application is rendered:
Example:
Tall three tip sagebrush is associated with gray horsebrush (Tetradymia canescens).
The text in parenthesis (i.e. species name) need to be italicized, and the rest needs to remain normal text. I have tried storing the information in the MySQL database with html tags around Tetradymia canescens
<i></i>
<html><i></i></htm>
but that did not work, and I have not been able to find other suggestions online.
For reference (if needed?), here is how the information is being pulled into the html page where it will be displayed.
species.html
<tr>
<th scope="row">Additional Species Information</th>
<td colspan="3">{{ species.Notes }}</td>
</tr>
<tr>
You can use Jinja filters to introduce the italics tag like this:
<td colspan="3">
{{ species.Notes|replace('(', '<i>(')|replace(')', ')</i>')|safe }}
</td>
Ideally you'd preprocess the text with a regex expression and pass it to the template with the tags already there, then you would just need the safe filter alone.

XSS remediation - Improper Neutralization of Script-Related HTML Tags

I'm trying to fix some XSS errors with my code. #getEmailRecord is the line that contains the problem. How do I fix a piece of code like this? The error: Improper Neutralization of Script-Related HTML Tags in a Web Page (Basic XSS). Veracode cleansing solution: coldfusion.runtime.CFPage.HTMLEditFormat
tr>
<td> </td>
<td class="left"><b>To: </b></td>
<td class="left">#getEmailRecord.EMAIL_TO#</td></tr>
<tr><td colspan="4"> </td></tr>
Thanks! This is my first time doing something like this so any help is much appreciated.
Veracode cleansing solution: coldfusion.runtime.CFPage.HTMLEditFormat The recommended solution tells you what to do. Wrap any variables which contain user supplied data that you utilize in your code in #HTMLEditFormat()#.
<td class="left">#HTMLEditFormat(getEmailRecord.EMAIL_TO)#</td></tr>
HTMLEditFormat
Description
Replaces special characters in a string with their HTML-escaped equivalents.
Add if you are on ColdFusion 10 or newer you have even more options - EncodeFor Functions

How to make a plone view that inserts other smaller views of content items?

I think this should be simple. I have a folderish TTW dexterity content item (a drop box) that contains folderish TTW dexterity items (proposals). Each proposal contains TTW dexterity reviews that have fields I want to summarize.
I can easily make a view that generates a table as indicated below for any proposal with simple modifications to the folderlisting view:
[review1 link] [criterion_1 value] [criterion-2 value]...
[review2 link] [criterion_1 value] [criterion-2 value]...
.
.
I can also generate a working table view for a drop box by modifying the folderlisting view:
[proposal1 link] [column I would like to insert the above table in for this proposal]
[proposal2 link] [column I would like to insert the above table in for this proposal]
.
.
My problem is I cannot figure out how to insert the first table into the cells in the second column of the second table. I've tried two things:
Within the view template for the dropbox listing, I tried duplicating the repeat macro of the listingmacro, giving it and all its variables new names to have it iterate on each proposal. This easily accesses all of the Dublin core schemata for each review, but I cannot get access to the dexterity fields. Everything I have tried (things that work when generating the first table) yield LocationError and AttributeError warnings. Somehow when I go down one level I lose some of the information necessary for the view template to find everything. Any suggestions?
I've also tried accessing the listing macro for the proposal, with calls like <metal use-macro="item/first_table_template_name/listing"/>. Is this even partially the right approach? It gives no errors, but also does not insert anything into my page.
Thanks.
This solution is loosely based on the examples provided by kuel: https://github.com/plone/Products.CMFPlone/blob/854be6e30d1905a7bb0f20c66fbc1ba1f628eb1b/Products/CMFPlone/skins/plone_content/folder_full_view.pt and https://github.com/plone/Products.CMFPlone/blob/b94584e2b1231c44aa34dc2beb1ed9b0c9b9e5da/Products/CMFPlone/skins/plone_content/folder_full_view_item.pt. --Thank you.
The way I found easiest to create and debug this was:
Create a minimalist template from the plone standard template folder_listing.pt which makes just the table of summarized review data for a single proposal. The template is just for a table, no header info or any other slots. This is a stripped version, but there is nothing above the first statement. A key statement that allowed access to the fields were of the form:
python: item.getObject().restrictedTraverse('criterion_1')
The table template:
<table class="review_summary listing">
<tbody><tr class="column_labels"><th>Review</th><th>Scholarly Merit</th><th>Benefits to Student</th><th>Clarity</th><th>Sum</th></tr>
<metal:listingmacro define-macro="listing">
<tal:foldercontents define="contentFilter contentFilter|request/contentFilter|nothing;
contentFilter python:contentFilter and dict(contentFilter) or {};
I kept all the standard definitions from the original template.
I have just removed them for brevity.
plone_view context/##plone;">
The following tal:sum is where I did some math on my data. If you are
not manipulating the data this would not be needed. Note that I am only
looking at the first character of the choice field.
<tal:sum define="c1_list python:[int(temp.getObject().restrictedTraverse('criterion_1')[0])
for temp in batch if temp.portal_type=='ug_small_grants_review'];
c1_length python: test(len(c1_list)<1,-1,len(c1_list));
c2_list python:[int(temp.getObject().restrictedTraverse('criterion_2')[0])
for temp in batch if temp.portal_type=='ug_small_grants_review'];
c2_length python: test(len(c2_list)<1,-1,len(c2_list));
c1_avg python: round(float(sum(c1_list))/c1_length,2);
c2_avg python: round(float(sum(c2_list))/c2_length,2);
avg_sum python: c1_avg+c2_avg;
">
<tal:listing condition="batch">
<dl metal:define-slot="entries">
<tal:entry tal:repeat="item batch" metal:define-macro="entries">
<tal:block tal:define="item_url item/getURL|item/absolute_url;
item_id item/getId|item/id;
Again, this is the standard define from the folder_listing.pt
but I've left out most of it to save space here.
item_samedate python: (item_end - item_start < 1) if item_type == 'Event' else False;">
<metal:block define-slot="entry"
The following condition is key if you can have things
other than reviews within a proposal. Make sure the
item_type is proper for your review/item.
tal:condition="python: item_type=='ug_small_grants_review'">
<tr class="review_entry"><td class="entry_info">
<dt metal:define-macro="listitem"
tal:attributes="class python:test(item_type == 'Event', 'vevent', '')">
I kept all the standard stuff from folder_listing.pt here.
</dt>
<dd tal:condition="item_description">
</dd>
</td>
The following tal:comp block is used to calculate values
across the rows because we do not know the index of the
item the way the batch is iterated.
<tal:comp define = "crit_1 python: item.getObject().restrictedTraverse('criterion_1')[0];
crit_2 python: item.getObject().restrictedTraverse('criterion_2')[0];
">
<td tal:content="structure crit_1"># here</td>
<td tal:content="structure crit_2"># here</td>
<td tal:content="structure python: int(crit_1)+int(crit_2)"># here</td>
</tal:comp>
</tr>
</metal:block>
</tal:block>
</tal:entry>
</dl>
<tr>
<th>Average</th>
<td tal:content="structure c1_avg"># here</td>
<td tal:content="structure c2_avg"># here</td>
<td tal:content="structure avg_sum"># here</td>
</tr>
</tal:listing>
</tal:sum>
<metal:empty metal:define-slot="no_items_in_listing">
<p class="discreet"
tal:condition="not: folderContents"
i18n:translate="description_no_items_in_folder">
There are currently no items in this folder.
</p>
</metal:empty>
</tal:foldercontents>
</metal:listingmacro>
</tbody></table>
Create another listing template that calls this one to fill the appropriate table cell. Again, I used a modification of the folder_listing.pt. Basically within the repeat block I put the following statement in the second column of the table:
This belongs right after the </dd> tag ending the normal item listing.
</td> <td class="review_summary">
<div tal:replace="structure python:item.getObject().ug_small_grant_review_summary_table()" />
</td>
Note that "ug_small_grant_review_summary_table" is the name I gave to the template shown in more detail above.