Coldfusion - Simple HTML Parsing - regex

We currently have some articles that get posted onto our site. they can appear with the following types of html
<p>this is an article<br>
<img src="someimage">
</p>
<p>this is an article<br>
<img src="someimage">
</p>
<p>this is an article<br>
<img src="someimage">
</p>
<p>this is an article<br>
<img src="someimage">
</p>
or
<p><img src="someimage">
this is an article<br>
</p>
<p>this is an article<br>
<img src="someimage">
</p>
<p><img src="someimage">
this is an article<br>
</p>
Some other html tags may be inside this sometimes, I cant get my head around how to scrape the page using coldfusion to achieve this
Esentially what i need to do is grab hold of the first paragraph text and image and be able to arrange it.
Is this possible using Coldfusion 8 ? Would anyone be able to point me in the direction on how to learn this ?

100% definitely possible!
Now, don't be put off by what I'm going to suggest, it's actually very easy to get going with this.
Download a library called jSoup...it's sole purpose is for scraping contents from the DOM in a web page:
http://jsoup.org/
You would then use this Java class by doing something like:
<!--- Get the page. --->
<cfhttp method="get" url="http://example.com/" resolveurl="true" useragent="#cgi.http_user_agent#" result="myPage" timeout="10" charset="utf-8">
<cfhttpparam type="header" name="Accept-Encoding" value="*" />
<cfhttpparam type="header" name="TE" value="deflate;q=0" />
</cfhttp>
<!--- Load up jSoup and parse the document with it. --->
<cfset jsoup = createObject("java", "org.jsoup.Jsoup") />
<cfset document = jsoup.parse(myPage.filecontent) />
<!--- Search the parsed document for the contents of the TITLE tag. --->
<cfset title = document.select("title").first() />
<!--- Let's see what we got. --->
<cfdump var="#title#" />
This example is pretty simple but it can show you just how easy it is to work with. Scraping images and whatever else would be fairly easy if you check out the docs on jSoup.
There are some good examples on this page, where you can use CSS style selectors:
http://jsoup.org/cookbook/extracting-data/selector-syntax
Try to avoid using Regex for this task - believe me, I've tried and it's an absolute can of worms!
Hope this helps.
Mikey.

Related

How to display image in email using coldfusion?

I've pasted a image in CKEditor and stored into DB. After retrive the data and sent to a mail. The Text content is displaying into the email but not displaying the image. I'm not sure how to display the image in email without cfmailparam. I've stored those content in cfsavecontent and put the variable into the cfmail tag. Exmple code given following..
<cfsavecontent variable="MailBody">
<cfoutput>
Test comment with Image<br /> <br /> <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAJYAAAA5CAYAAADUS9LZAAAgAElEQVR4nO2cZ3wUV5qvu429O7t7d2Y8YefO7O7d3Uswygnl0N2SWjnngGWwwUOyMZiMQUKARJDIOdlgvAM2BmyQkEAJRVAAJCFEEpiclEBSd3V/ee6H6i61JPDYHsC+gHsUyRAAAAAElFTkSuQmCC" /><br /> <br /> test signature<br /> thanks,<br /> tester
</cfoutput>
</cfsavecontent>
Note: The img src data is rough value.
<cfmail from="tester#mail.com" subject="TestEmail" to="dev#email.com" server="SMTP">
#MailBody#
</cfmail>
Now the email sending without the image. Its working fine in CF2010 and not working in CF2016.
How can I display that image into the mail? Please guide me I'm a new guy for the CF technology.
Thanks in Advance!
Instead of img tag use cfimage tag with isBase64="yes" attribute. Like this,
<cfsavecontent variable="MailBody">
Test comment with Image<br /><cfimage action = "writeToBrowser" source ="data:image/png;base64,iVBORw0KGgoAAAANSUhEUg..." isBase64= "yes">
<br /> <br /> <br /> test signature<br /> thanks,<br /> tester
</cfsavecontent>
<cfmail from="tester#mail.com" subject="TestEmail" to="dev#email.com" server="SMTP">
<cfmailpart type="text/html">
#MailBody#
</cfmailpart>
</cfmail>

ColdFusion - CFDOCUMENT Title in URL

I am creating a PDF document using ColdFusion cfdocument tag. Works fine, however instead of showing the document name in browser Title - it shows the .cfc file that I call to create the PDF.
Here is how I'm calling it.
<cfdocument format="pdf" marginbottom=".5" margintop=".25" marginright=".5" marginleft=".5" saveAsName="#filename#.pdf">
<cfdocumentitem type="footer">
<p style="font-size:11px; text-align:right; font-style:italic;">Page #cfdocument.currentpagenumber# of #cfdocument.totalpagecount#</p>
</cfdocumentitem>
<html>
<head><title>#filename#.pdf</title></head>
<body><img src="file:///#application.tempFolder#\#thisFilename#" /></body>
</html>
</cfdocument>
What the heck am I missing? Why does it still show the filename.cfc file that I'm calling in the browser title instead of the filename I give to the PDF???
Figured it out. Had to create the document using CFDOCUMENT, then add a "Title" attribute to it using the CFPDF tag. Then output it to the browser.
<!--- Create the PDF --->
<cfdocument format="pdf" marginbottom=".5" margintop=".25" marginright=".5" marginleft=".5" filename="#application.tempFolder#\#thisSaveAsFilename#" overwrite="yes">
<cfdocumentitem type="footer">
<p style="font-size:11px; text-align:right; font-style:italic;">Page #cfdocument.currentpagenumber# of #cfdocument.totalpagecount#</p>
</cfdocumentitem>
<html>
<head><title>#thisSaveAsFilename#</title></head>
<body><img src="file:///#application.tempFolder#\#thisFilename#" /></body>
</html>
</cfdocument>
<!--- Use CFPDF to add attributes to it --->
<cfset thisInfo = StructNew()>
<cfset thisInfo.Title = "pdf title goes here...">
<cfpdf action="setinfo" info="#thisInfo#" source="#application.tempFolder#\#thisSaveAsFilename#" />
<!--- Send it to the browser --->
<cfcontent file="#application.tempFolder#\#thisSaveAsFilename#" type="application/pdf" />A

Creating a Word document in Coldfusion - how to have pagenumbering?

I am creating a Word format .doc using the following code, then cfheader and cfcontent to serve. All is good but I need to be able to place dynamic information in the header (or footer), or automatic pagenumbering would be a second best option.
How should I modify the code?
<cfsavecontent variable="myDocument">
<html xmlns:w="urn:schemas-microsoft-com:office:word">
<!--- Head tag instructs Word to start up a certain way, specifically in
print view. --->
<head>
<xml>
<w:WordDocument>
<w:View>Print</w:View>
<w:SpellingState>Clean</w:SpellingState>
<w:GrammarState>Clean</w:GrammarState>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
</w:Compatibility>
<w:DoNotOptimizeForBrowser/>
</w:WordDocument>
</xml>
</head>
<body>
Regular HTML document goes here
<!--- Create a page break microsoft style (took hours to find this)
--->
<br clear="all"
style="page-break-before:always;mso-break-type:page-break" />
Next page goes here
</body>
</html>
</cfsavecontent>
Please have a look at this: Header & Footer
I have successfully created custom header and footer with only one html file using this article. (Word 2003)
Hope this helps!
Doesn't seem easy to add page number using a WordprocessingML
http://openxmldeveloper.org/archive/2006/08/03/443.aspx
If you can serve PDF instead of DOC, here's a solution for page numbering.
http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7c21.html
See example 2:
<cfdocument format="pdf">
<cfdocumentitem type="header" evalatprint="true">
<table width="100%" border="0" cellpadding="0" cellspacing="0">
<tr><td align="right"><cfoutput>#cfdocument.currentsectionpagenumber# of
#cfdocument.totalsectionpagecount#</cfoutput></td></tr>
</table>
</cfdocumentitem>
<cfdocumentitem type="footer" evalatprint="true">
<table width="100%" border="0" cellpadding="0" cellspacing="0">
<tr><td align="center"><cfoutput>#cfdocument.currentpagenumber# of
#cfdocument.totalpagecount#</cfoutput></td></tr>
</table>
</cfdocumentitem>
...
</cfdocument>

cfdiv working in FF and Safari but does not show up in IE

Within a form I have a button that launches a cfwindow, then presents a search screen for the user to make a selection. Once selection is made, the cfwindow closes and the selected content shows in the main page by being bound to a cfdiv. This all works fine in FF but the cfdiv doesn't show at all in IE. In IE, the cfwindow works, the select works, but then no bound page.
I have tried setting bindonload and that made no difference (and I need it to be true if there is content that is pulled in via a query when it loads). All I have been able to find so far regarding this issue is setting bindonload to false and putting the cfdiv outside of the form but that's not possible in my current design.
*4/21 update
This works as expected in FF 3.6.3 and Safari 4, but does not work in multiple IE versions. In IE, the cfwindow works, the select works, but when the window closes and it tries to load the page into the div it just spins.
This is the main page, test.cfm:
<cfajaximport tags="cfwindow, cfform, cfdiv, cftextarea, cfinput-datefield">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
</head>
<body>
<cfset i = 1>
<cfform>
<table>
<cfloop from="1" to="4" index="n">
<tr>
<td class="left" style="white-space:nowrap;">
<cfoutput>#n#</cfoutput>. <cfinput type="button" value="Select #n#" name="x#n#Select#i#" onClick="ColdFusion.Window.create('x#n#Select#i#', 'Exercise Lookup', 'xSelect/xSelect2.cfm?xNameVar=x#n#S#i#&window=x#n#Select#i#&workout=workout#i#', {x:100,y:100,height:500,width:720,modal:true,closable:true,draggable:true,resizable:true,center:true,initshow:true,minheight:200,minwidth:200 })" />
<cfdiv bind="url:xSelect/x2.cfm" ID="x#n#S#i#" tagName="span" bindonload="false" />
<cfinput type="hidden" ID="x#n#s#i#" name="x#n#s#i#" value="#n#" />
</td>
</tr>
</cfloop>
</table>
</cfform>
</body>
</html>
This is the cfwindow, xSelect2.cfm:
<cfparam name="form.xSelected" default="0">
<cfoutput>
1<br />
2<br />
3<br />
4<br />
</cfoutput>
This is the page bound to the cfdiv, x2.cfm:
<cfajaximport tags="cfwindow, cfform, cfdiv, cftextarea, cfinput-datefield">
<cfparam name="url.xName" default="">
<cfparam name="url.xNameVar" default="">
<cfparam name="url.xID" default=0>
<form>
<cfoutput>
<input type="text" id="xName" name="xName" value="#url.xName#" size="27" disabled="true" />
<input type="hidden" id="xNameVar" name="xNameVar" value="#url.xNameVar#" />
<input type="hidden" id="#url.xNameVar#xID" name="#url.xNameVar#xID" value="#url.xID#" />
</cfoutput>
</form>
I am significantly stuck so any help is greatly appreciated.
AND OF COURSE IF ANYONE HAS A BETTER IDEA OF HOW TO ACHIEVE THE SAME FUNCTIONALITY PLEASE SHARE!
Thanks!
The answer was very, very simple. I got the clue from Mathijs' Weblog on whitehorsez.com (thank you!!).
Evidently IE doesn't like nested forms so all I needed to do in the end was remove the form tags from x2.cfm above. It makes that page incorrect, but when read into the cfdiv it works and posts all the correct values to the form. I finished one other rough solution using getElementById which eliminated the extra page but the problem with it was that you had to save before you could change the value if there were multiple options. Here is the new and simple x2.cfm:
<cfoutput>
<input type="text" name="xName" value="#url.xName#" size="27" disabled="true" />
<input type="hidden" name="xNameVar" value="#url.xNameVar#" />
<input type="hidden" name="#url.xNameVar#xID" value="#url.xID#" />
</cfoutput>

ColdFusion: get the name of a file before uploading

How can I get the filename of a file before I call the
<cffile action = "upload">
? I can get the filename of the temp file, but not of the actual filename. In PHP land I can use the $_FILES superglobal to get what I want - but as far as I can tell no such thing exists in ColdFusion.
I can get the filename client-side but would really want to do this server side.
Thanks
Yes this is possible. You can use this function to grab the client file name before using the cffile tag:
<cffunction name="getClientFileName" returntype="string" output="false" hint="">
<cfargument name="fieldName" required="true" type="string" hint="Name of the Form field" />
<cfset var tmpPartsArray = Form.getPartsArray() />
<cfif IsDefined("tmpPartsArray")>
<cfloop array="#tmpPartsArray#" index="local.tmpPart">
<cfif local.tmpPart.isFile() AND local.tmpPart.getName() EQ arguments.fieldName> <!--- --->
<cfreturn local.tmpPart.getFileName() />
</cfif>
</cfloop>
</cfif>
<cfreturn "" />
</cffunction>
More info here: http://www.stillnetstudios.com/get-filename-before-calling-cffile/
I'm using Railo and found the original filenames with:
GetPageContext().formScope().getUploadResource('your_file_input_form_name').getName();
maybe this works on an adobe server as well? its quite handy if you want to rename your uploaded file somehow and don't want it to get moved through two temp dirs (see Renaming Files As They Are Uploaded (how CFFILE actually works))
I don't know of a way to find out before calling cffile, but there may be a workaround.
When you call <cffile action="upload"> you can specify a result using result="variable". So, call the upload with the destination as a temp file. Your result variable is a struct which contains the member clientFile, which is the name of the file on the client's computer.
Now, you can use <cffile action="move"> to do whatever it is you need to do with the original filename.
WOW, i found a great and easy solution! with a little javascript
In this way you get the temp filename for the cffile upload and the actual file.jpg name for the database
<html>
<head>
<script type="text/javascript">
function PassFileName()
{
document.getElementById("fileName").value=document.getElementById("fileUp").value;
}
</script>
</head>
<body>
<form name="form1" method="post" enctype="multipart/form-data" >
File: <input type="file" name="fileUp" id="fileUp" size="20" onchange="PassFileName()" /> <br />
Title: <input type="text" name="Title" id="Title"><br />
<input type="hidden" id="fileName" size="20" name="fileName" />
<input type="submit" name="submit">
</form>
</body>
</html>
If you have the name attribute defined on the input control, the file name will be in the FORM scope. For example:
<cfif not structIsEmpty(form)>
<cfdump var="#form#">
<cfelse>
<html>
<head>
<title>Title</title>
</head>
<body>
<form method="POST" action="#cgi.SCRIPT_NAME#">
<input type="file" name="fileIn" />
<input type="Submit" name="formSubmit">
</form>
</body>
</html>
</cfif>
Another option might be to have client-side code populate a hidden form field with the filename, which you would then have server-side.
Ben Doom's answer is generally how I would approach it, though.
Here's how we do it. Basically, there is a file field, and a string field. JavaScript grabs the filename from the browser before the form is submitted. Obviously, you need to verify that the filename on the other end is actually present (it'll be blank if the user has JavaScript disabled, for example) and you'll need to parse the string to handle platform differences (/users/bob/file.jpg versus C:\Documents and Settings\bob\file.jpg)
<script>
function WriteClientFileName(){
$('ClientFileName').value = $('ClientFile').value;
}
</script>
<form enctype="multipart/form-data" onsubmit="WriteClientFileName();">
<input type="File" name="ClientFile" id="ClientFile">
<input type="hidden" name="ClientFileName" id="ClientFileName" value="">
<input type="submit">
</form>
Incidentally, this technique is cross-language. It'll work equally well in RoR, PHP, JSP, etc.
Edit: If a user is "wielding a fierce FireBug" what's the issue? Even if they don't have Firebug, they can still rename the file on their end and change the input. Plus, you're validating your inputs, right?
There is no way to know the file name for uploaded files before saving to the server in ColdFuson, Railo or OpenBD. I typically generate 'my' new filename using the createUUID() function in advance of saving the file.