scale PDF to single page - coldfusion

Is there a simple way to scale the pdf created by cfdocument or cfpdf to a single page using CF8? My output (a calendar) could possibly extend to page 2 depending on the number of events. I'd rather scale the calendar to fit it in one page. I assume I can create the pdf with cfdocument. Use cfpdf to check the page numbers and loop while totalPages > 1 create the PDF with a lesser scale.
psudo code:
pdfScale = 100
cfdocument scale = "#pdfScale#"
cfpdf action = "getinfo" name = "mypdf"
cfloop while mypdf.totalPages > 1
pdfScale = pdfScale -5
cfdocument scale = "#pdfScale#"
cfpdf action = "getinfo" name = "mypdf"
/cfloop
Am I on the right track or am I missing something to make this easier? Thanks.

Your theory seems sound to me - you should try it and find out. Since that's a rather boring answer, I've also converted your pseudocode to real code:
<cfset pdfScale = 100 />
<cfset pdfObj = "" />
<cfdocument format="pdf" name="pdfObj" scale="#pdfScale#">document contents</cfdocument>
<cfpdf action="getInfo" source="#pdfObj#" name="pdfInfo" />
<cfloop condition = "pdfInfo.TotalPages gt 1">
<cfset pdfScale -= 5 />
<cfdocument format="pdf" name="pdfObj" scale="#pdfScale#">document contents</cfdocument>
<cfpdf action="getInfo" source="#pdfObj#" name="pdfInfo" />
</cfloop>
Depending on your setup, you might also want to abstract the creation of the PDF into a function, so that you don't have to re-write all of the content twice on the page. Or you could use an include. Heck, if there is any sort of complex processing going on to render the HTML for the PDF (which I assume there is, since you're making a calendar), then you might even want to pre-render the content and re-use it, like so:
<cfsavecontent variable="docContents">document contents go here</cfsavecontent>
<cfset pdfScale = 100 />
<cfset pdfObj = "" />
<cfdocument format="pdf" name="pdfObj" scale="#pdfScale#"><cfoutput>#docContents#</cfoutput></cfdocument>
<cfpdf action="getInfo" source="#pdfObj#" name="pdfInfo" />
<cfloop condition = "pdfInfo.TotalPages gt 1">
<cfset pdfScale -= 5 />
<cfdocument format="pdf" name="pdfObj" scale="#pdfScale#"><cfoutput>#docContents#</cfoutput></cfdocument>
<cfpdf action="getInfo" source="pdfObj" name="pdfInfo" />
</cfloop>

Related

Using Jsoup (the Java html parser) with Handlebarsjs

So, I'm using jsoup and when I display the content returned I Get:
{{#ifcond="" pagetitle="" this.name}}
I'm doing this like local.htmlObj.Body().Html()
When I need it to be return like:
{{#ifCond PAGETITLE this.NAME}}
Here what I'm doing
<cfset paths = [] />
<cfset paths[1] = expandPath("/javaloader/lib/jsoup-1.7.1.jar") />
<cfset loader = createObject("component", "javaloader.JavaLoader").init( paths ) />
<cfset obj = loader.create( "org.jsoup.Jsoup" ) />
<cfset local.htmlObj = local.jsoupObj.parse( local.template ) />
<cfloop array="#local.htmlObj.select('.sidebar_left')#" index="element">
<cfif element.attr('section') EQ "test">
<cfset element.append('HTML HERE') />
</cfif>
</cfloop>
local.template is my template that is made up of a ton of different handlebar files That im pulling for different places. I'm constructing one handlebar file that gets returned.
The problem is that JSoup is trying to parse invalid HTML before it lets you access it. A slight easier to understand example of this behavior can be seen if you fetch the following HTML (seen in this question):
<p>
<table>[...]</table>
</p>
It will return:
<p></p>
<table>[...]</table>
In your case the Handelbars code is seen as attributes which in valid html always have a value (think checked="checked"). As far as I can tell there is no way to disable this behavior. It's really the wrong tool for the job you are trying to do. A cleaner approach would be to just fetch the document as a stream and save it to a string.

How to extract slide notes from a PowerPoint file with ColdFusion

I have a .PPT (PowerPoint, transferrable to ODP or PPTX) file with speaker notes on every slide. I want to extract the entire presentation into something dynamic so I can create a speaker cheat sheet for running on a phone or table while I talk (thumbnail of the slide with speaker notes). I do this just often enough to HATE doing it by hand.
This is almost easy enough with <cfpresentation format="html" showNotes="yes"> which splits the PPT up into HTML pages and creates an image for every slide. cfpresentation, however, does not transfer the speaker notes, they are lost in translation.
I have also tried <cfdocument> which has no options for preserving slide notes once it converts to PDF.
Is there a way to get the notes out of the PowerPoint file from within ColdFusion?
The simplest solution:
Convert the PowerPoint presentation to OpenOffice ODP format. That's a ZIP file. CFML can unzip it and inside there's a content.xml file which contains the slides and the notes, so CFML can extract the notes from that format.
Given the CFDOCUMENT functionality, perhaps ColdFusion can even convert the PPT to ODP for you?
There's no way to do this directly in CF. You can do this by dropping to the underlying Java. I stand corrected. Using the showNotes attribute on the <cfpresentation> tag, should add the notes to the HTML.
As an alternative, or if that doesn't work for some reason, you should be able to use Apache POI to do this, although you may need to use a more recent version of poi than shipped with your version of coldfusion, which may require some additional work.
public static LinkedList<String> getNotes(String filePath) {
LinkedList<String> results = new LinkedList<String>();
// read the powerpoint
FileInputStream fis = new FileInputStream(filePath);
SlideShow slideShow = new SlideShow(is);
fis.close();
// get the slides
Slide[] slides = ppt.getSlides();
// loop over the slides
for (Slide slide : slides) {
// get the notes for this slide.
Notes notes = slide.getNotesSheet();
// get the "text runs" that are part of this slide.
TextRun[] textRuns = notes.getTextRuns();
// build a string with the text from all the runs in the slide.
StringBuilder sb = new StringBuilder();
for (TextRun textRun : textRuns) {
sb.append(textRun.getRawText());
}
// add the resulting string to the results.
results.add(sb.toString());
}
return results;
}
Carrying over complex formatting may be a challenge (bulleted lists, bold, italics, links, colors, etc.), as you'll have to dig much deeper into TextRuns, and the related API's and figure how to generate HTML.
CFPRESENTATION (at least as of version 9) does have a showNotes attribute, but you'd still have to parse the output. Depending on the markup of the output, jQuery would make short work of grabbing what you want.
Felt bad that my above answer didn't work out so I dug a little bit. It's a little dated, but it works. PPTUtils, which is based on the apache library that #Antony suggested. I updated this one function to do what you want. You may have to tweak it a bit to do exactly what you want, but I like the fact that this utility returns the data to you in data format rather than in HTML which you'd have to parse.
And just in case, here is the POI API reference I used to find the "getNotes()" function.
<cffunction name="extractText" access="public" returntype="array" output="true" hint="i extract text from a PPT by means of an array of structs containing an array element for each slide in the PowerPoint">
<cfargument name="pathToPPT" required="true" hint="the full path to the powerpoint to convert" />
<cfset var hslf = instance.loader.create("org.apache.poi.hslf.HSLFSlideShow").init(arguments.pathToPPT) />
<cfset var slideshow = instance.loader.create("org.apache.poi.hslf.usermodel.SlideShow").init(hslf) />
<cfset var slides = slideshow.getSlides() />
<cfset var notes = slideshow.getNotes() />
<cfset var retArr = arrayNew(1) />
<cfset var slide = structNew() />
<cfset var i = "" />
<cfset var j = "" />
<cfset var k = "" />
<cfset var thisSlide = "" />
<cfset var thisSlideText = "" />
<cfset var thisSlideRichText = "" />
<cfset var rawText = "" />
<cfset var slideText = "" />
<cfloop from="1" to="#arrayLen(slides)#" index="i">
<cfset slide.slideText = structNew() />
<cfif arrayLen(notes)>
<cfset slide.notes = notes[i].getTextRuns()[1].getRawText() />
<cfelse>
<cfset slide.notes = "" />
</cfif>
<cfset thisSlide = slides[i] />
<cfset slide.slideTitle = thisSlide.getTitle() />
<cfset thisSlideText = thisSlide.getTextRuns() />
<cfset slideText = "" />
<cfloop from="1" to="#arrayLen(thisSlideText)#" index="j">
<cfset thisSlideRichText = thisSlideText[j].getRichTextRuns() />
<cfloop from="1" to="#arrayLen(thisSlideRichText)#" index="k">
<cfset rawText = thisSlideRichText[k].getText() />
<cfset slideText = slideText & rawText />
</cfloop>
</cfloop>
<cfset slide.slideText = duplicate(slideText) />
<cfset arrayAppend(retArr, duplicate(slide)) />
</cfloop>
<cfreturn retArr />
</cffunction>

Searching a folder (recursively) for duplicate photos using Coldfusion?

After moving and backing up my photo collection a few times I have several duplicate photos, with different filenames in various folders scattered across my PC. So I thought I would write a quick CF (9) page to find the duplicates (and can then add code later to allow me to delete them).
I have a couple of queries:-
At the moment I am just using file size to match the image file, but I presume matching EXIF data or matching hash of image file binary would be more reliable?
The code I lashed together sort of works, but how could this be done to search outside web root?
Is there a better way?
p
<cfdirectory
name="myfiles"
directory="C:\ColdFusion9\wwwroot\images\photos"
filter="*.jpg"
recurse="true"
sort="size DESC"
type="file" >
<cfset matchingCount=0>
<cfset duplicatesFound=0>
<table border=1>
<cfloop query="myFiles" endrow="#myfiles.recordcount#-1">
<cfif myfiles.size is myfiles.size[currentrow + 1]>
<!---this file is the same size as the next row--->
<cfset matchingCount = matchingCount + 1>
<cfset duplicatesFound=1>
<cfelse>
<!--- the next file is a different size --->
<!--- if there have been matches, display them now --->
<cfif matchingCount gt 0>
<cfset sRow=#currentrow#-#matchingCount#>
<cfoutput><tr>
<cfloop index="i" from="#sRow#" to="#currentrow#">
<cfset imgURL=#replace(directory[i], "C:\ColdFusion9\wwwroot\", "http://localhost:8500/")#>
<td><img height=200 width=200 src="#imgURL#\#name[i]#"></td>
</cfloop></tr><tr>
<cfloop index="i" from="#sRow#" to="#currentrow#">
<td width=200>#name[i]#<br>#directory[i]#</td>
</cfloop>
</tr>
</cfoutput>
<cfset matchingCount = 0>
</cfif>
</cfif>
</cfloop>
</table>
<cfif duplicatesFound is 0><cfoutput>No duplicate jpgs found</cfoutput></cfif>
This is pretty fun task, so I've decided to give it a try.
First, some testing results on my laptop with 4GB RAM, 2x2.26Ghz CPU and SSD: 1,143 images, total 263.8MB.
ACF9: 8 duplicates, took ~2.3 s
Railo 3.3: 8 duplicates, took ~2.0 s (yay!)
I've used great tip from this SO answer to pick the best hashing option.
So, here is what I did:
<cfsetting enablecfoutputonly="true" />
<cfset ticks = getTickCount() />
<!--- this is great set of utils from Apache --->
<cfset digestUtils = CreateObject("java","org.apache.commons.codec.digest.DigestUtils") />
<!--- cache containers --->
<cfset checksums = {} />
<cfset duplicates = {} />
<cfdirectory
action="list"
name="images"
directory="/home/trovich/images/"
filter="*.png|*.jpg|*.jpeg|*.gif"
recurse="true" />
<cfloop query="images">
<!--- change delimiter to \ if you're on windoze --->
<cfset ipath = images.directory & "/" & images.name />
<cffile action="readbinary" file="#ipath#" variable="binimage" />
<!---
This is slow as hell with any encoding!
<cfset checksum = BinaryEncode(binimage, "Base64") />
--->
<cfset checksum = digestUtils.md5hex(binimage) />
<cfif StructKeyExists(checksums, checksum)>
<!--- init cache using original on 1st position when duplicate found --->
<cfif NOT StructKeyExists(duplicates, checksum)>
<cfset duplicates[checksum] = [] />
<cfset ArrayAppend(duplicates[checksum], checksums[checksum]) />
</cfif>
<!--- append current duplicate --->
<cfset ArrayAppend(duplicates[checksum], ipath) />
<cfelse>
<!--- save originals only into the cache --->
<cfset checksums[checksum] = ipath />
</cfif>
</cfloop>
<cfset time = NumberFormat((getTickcount()-ticks)/1000, "._") />
<!--- render duplicates without resizing (see options of cfimage for this) --->
<cfoutput>
<h1>Found #StructCount(duplicates)# duplicates, took ~#time# s</h1>
<cfloop collection="#duplicates#" item="checksum">
<p>
<!--- display all found paths of duplicate --->
<cfloop array="#duplicates[checksum]#" index="path">
#HTMLEditFormat(path)#<br/>
</cfloop>
<!--- render only last duplicate, they are the same image any way --->
<cfimage action="writeToBrowser" source="#path#" />
</p>
</cfloop>
</cfoutput>
Obviously, you can easily use duplicates array to review the results and/or run some cleanup job.
Have fun!
I would recommend split up the checking code into a function which only accepts a filename.
Then use a global struct for checking for duplicates, the key would be "size" or "size_hash" and the value could be an array which will contain all filenames that matches this key.
Run the function on all jpeg files in all different directories, after that scan the struct and report all entries that have more than one file in it's array.
If you want to show an image outside your webroot you can serve it via < cfcontent file="#filename#" type="image/jpeg">

How can I strip this URL of everything before "http://"?

I'm doing some web scraping with ColdFusion and mostly everything is working fine. The only other issues I'm getting is that some URL's come through with text behind them that is now causing errors.
Not sure what's causing it, but it's probably my regex. Anyhow, there's a distinct pattern where text appears before the "http://". I'd like to simply remove everything before it.
Any chance you could help?
Take this string for example:
"I'M OBSESSED WITH MY BEAUTIFUL FRIEND" src="http://scs.viceland.com/feed/images/uk_970014338_300.jpg
I'd much appreciate your help as regex isn't something I've managed to make time for - hopefully I will some day!
Thanks.
EDIT:
Hi,
I thought it might be helpful to post my entire function, since it could be my initial REGEX that is causing the issue. Basically, the funcion takes one argument. In this case, it's the contents of a HTML file (via CFHTTP).
In some cases, every URL looks and works fine. If I try digg.com for example it works...but it dies on something like youtube.com. I guess this would be down to their specific HTML formatting. Either way, all I ever need is the value of the SRC attribute on image tags.
Here's what I have so far:
<cffunction name="extractImages" returntype="array" output="false" access="public" displayname="extractImages">
<cfargument name="fileContent" type="string" />
<cfset var local = {} />
<cfset local.images = [] />
<cfset local.imagePaths = [] />
<cfset local.temp = [] />
<cfset local.images = reMatchNoCase("<img([^>]*[^/]?)>", arguments.fileContent) />
<cfloop array="#local.images#" index="local.i">
<cfset local.temp = reMatchNoCase("(""|')(.*)(gif|jpg|jpeg|png)", local.i) />
<cfset local.path = local.temp />
<cfif not arrayIsEmpty(local.path)>
<cfset local.path = trim(replace(local.temp[1],"""","","all")) />
<cfset arrayAppend(local.imagePaths, local.path) />
</cfif>
<cfif isValid("url", local.path)>
<cftry>
<cfif fileExists(local.path)>
<cfset arrayAppend(local.imagePaths, local.path) />
</cfif>
<cfcatch type="any">
<cfset application.messagesObject.addMessage("error","We were not able to obtain all available images on this page.") />
</cfcatch>
</cftry>
</cfif>
</cfloop>
<cfset local.imagePaths = application.udfObject.removeArrayDuplicates(local.imagePaths) />
<cfreturn local.imagePaths />
</cffunction>
This function WORKS. However, on some sites, not so. It looks a bit over the top but much of it is just certain safeguards to make sure I get valid image paths.
Hope you can help.
Many thanks again.
Michael
Take a look at ReFind() or REFindNoCase() - http://cfquickdocs.com/cf9/#refindnocase
Here is a regex that will work.
<cfset string = 'IM OBSESSED WITH MY BEAUTIFUL FRIEND" src="http://scs.viceland.com/feed/images/uk_970014338_300.jpg' />
<cfdump var="#refindNoCase('https?://[-\w.]+(:\d+)?(/([\w/_.]*)?)?',string, 1, true)#">
You will see a structure returned with a POS and LEN keys. Use the first element in the POS array to see where the match starts, and the first element in the LEN array to see how long it is. You can then use these values in the Mid() function to grab just that matching URL.
I'm not familiar with ColdFusion, but it seems to me that you just need a regex that looks for http://, then any number of characters, then the end of the string.

Breadcrumbs in Fusebox 4/5

I'm wondering if anyone has come up with a clean way to generate a breadcrumbs trail in Fusebox. Specifically, is there a way of keeping track of "where you are" and having that somehow generate the breadcrumbs for you? So, for example, if you're executing
/index.cfm?fuseaction=Widgets.ViewWidget&widget=1
and the circuit structure is something like /foo/bar/widgets/ then somehow the system automatically creates an array like:
[
{ title: 'Foo', url: '#self#?fuseaction=Foo.Main' },
{ title: 'Bar', url: '#self#?fuseaction=Bar.Main' },
{ title: 'Widgets', url: '#self#?fuseaction=Widgets.Main' },
{ title: 'Awesome Widget', url: '' }
]
Which can then be rendered as
Foo > Bar > Widgets > Awesome Widget
Right now it seems the only way to really do this is to create the structure for each fuseaction in a fuse of some kind (either the display fuse or a fuse dedicated to creating the crumbtrail).
I'm working with Fusebox for a long time, but still can't understand this part:
circuit structure is something like /foo/bar/widgets/
Any way, once my idea was to use the custom lexicon called "parent" (or anything) for each controller fuseaction, where you put the name of previous level fuseaction.
But as I remember, this method was applicable only when using XML-style circuits, where you can always get any fuseaction info from the global container -- so I didn't make it because of intensive use of no-XMl style.
EDIT: example with lexicon
This will work only with Fusebox 5 traditional.
Let's say we have created following lexicon definition /lexicon/bc/parent.cfm:
<cfscript>
if (fb_.verbInfo.executionMode is "start") {
// validate fb_.verbInfo.attributes contents
if (not structKeyExists(fb_.verbInfo.attributes,"value")) {
fb_throw("fusebox.badGrammar.requiredAttributeMissing",
"Required attribute is missing",
"The attribute 'value' is required, for a 'parent' verb in fuseaction #fb_.verbInfo.circuit#.#fb_.verbInfo.fuseaction#.");
}
// compile start tag CFML code
circuit = fb_.verbInfo.action.getCircuit().getName();
fa = fb_.verbInfo.action.getCircuit().getFuseactions();
fa[#fb_.verbInfo.fuseaction#].parent = circuit & "." & fb_.verbInfo.attributes.value;
} else {
// compile end tag CFML code
}
</cfscript>
Basically this is copy-pasted standard lexicon tag specially for lexicon parent.
Assuming we're using Fusebox 5 skeleton example, contoller can look like:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE circuit>
<circuit access="public" xmlns:bc="bc/">
<postfuseaction>
<do action="layout.mainLayout" />
</postfuseaction>
<fuseaction name="welcome" bc:parent="">
<do action="time.getTime" />
<do action="display.sayHello" />
</fuseaction>
<fuseaction name="widgets" bc:parent="app.welcome">
<do action="display.showWidgets" />
</fuseaction>
<fuseaction name="widget" bc:parent="app.widgets">
<do action="display.showWidget" />
</fuseaction>
</circuit>
It shows how the lexicon used for each fuseaction. Please note that if you wont define the attribute bc:parent it wont appear in custom attributes structure later.
It is possible to use only fuseaction name as parent, if you have all chain within same circuit, it can be easier to use later.
Finally, some quick code to build the stuff. Please see comments, they should be helpful enough.
<!--- path data container with current fuseaction saved --->
<cfset arrBreadcrumbs = [] />
<cfset ArrayAppend(arrBreadcrumbs, attributes.fuseaction) />
<!--- pull the current circuit fuseactions --->
<cfset fuseactions = myFusebox.getApplication().circuits[ListFirst(attributes.fuseaction,'.')].getFuseactions() />
<!--- OR <cfset fuseactions = application.fusebox.circuits[ListFirst(attributes.fuseaction,'.')].getFuseactions()> --->
<!--- pull the current fuseaction custom attributes --->
<cfset fa = ListLast(attributes.fuseaction,'.') />
<cfset customAttributes = fuseactions[fa].getCustomAttributes('bc') />
<!--- save the parent fuseaction name if present -- KEY CHECK IS RECOMMENDED --->
<cfif StructKeyExists(customAttributes, "parent")>
<cfset ArrayPrepend(arrBreadcrumbs, customAttributes.parent) />
</cfif>
<!--- let's say we know that parent is there... --->
<!--- pull the found fuseaction custom attributes --->
<cfset fa = ListLast(customAttributes.parent,'.') />
<cfset customAttributes = fuseactions[fa].getCustomAttributes('bc') />
<!--- save the parent fuseaction name if present --->
<cfif StructKeyExists(customAttributes, "parent")>
<cfset ArrayPrepend(arrBreadcrumbs, customAttributes.parent) />
</cfif>
<!--- render the collected path --->
<cfoutput>
<cfloop index="crumb" from="1" to="#ArrayLen(arrBreadcrumbs)#">
<!--- to have a nice labels you can use another lexicon --->
#arrBreadcrumbs[crumb]# <cfif crumb LT ArrayLen(arrBreadcrumbs)>></cfif>
</cfloop>
</cfoutput>
So the output should look like this: app.welcome > app.widgets > app.widget
Here's something I use...
act_breadcrum.cfm
=============================
<cfscript>
if (NOT structKeyExists(session, 'clickstream'))
{
session.clickstream = arrayNew(1);
}
</cfscript>
<cflock name="addNewPage" type="exclusive" timeout="10">
<cfscript>
if ((arrayIsEmpty(session.clickstream))
OR (compare(myFusebox.originalCircuit, session.clickstream[arrayLen(session.clickstream)].Circuit))
OR (compare(myFusebox.originalFuseaction, session.clickstream[arrayLen(session.clickstream)].Fuseaction))
)
{
if (arrayLen(session.clickstream) EQ 10)
{
temp = arrayDeleteAt(session.clickstream, 1);
}
temp = arrayAppend(session.clickstream, structNew());
session.clickstream[arrayLen(session.clickstream)].Fuseaction = myFusebox.originalFuseaction;
session.clickstream[arrayLen(session.clickstream)].Circuit = myFusebox.originalCircuit;
}
</cfscript>
</cflock>
dsp_Breadcrum.cfm
==========================
<cfoutput>
<center>
<b><u>Last Clicked</u></b><BR>
<cfloop from="#arrayLen(session.clickstream)#" to="1" index="i" step="-1">
<cfset Opaque=i*.2>
<a href="#Myself##session.clickstream[i].Circuit#.#session.clickstream[i].Fuseaction#" style=opacity:#Opaque#>
#session.clickstream[i].Circuit#
</a><BR>
</cfloop>
</center>
</cfoutput>