How to extract slide notes from a PowerPoint file with ColdFusion - coldfusion

I have a .PPT (PowerPoint, transferrable to ODP or PPTX) file with speaker notes on every slide. I want to extract the entire presentation into something dynamic so I can create a speaker cheat sheet for running on a phone or table while I talk (thumbnail of the slide with speaker notes). I do this just often enough to HATE doing it by hand.
This is almost easy enough with <cfpresentation format="html" showNotes="yes"> which splits the PPT up into HTML pages and creates an image for every slide. cfpresentation, however, does not transfer the speaker notes, they are lost in translation.
I have also tried <cfdocument> which has no options for preserving slide notes once it converts to PDF.
Is there a way to get the notes out of the PowerPoint file from within ColdFusion?

The simplest solution:
Convert the PowerPoint presentation to OpenOffice ODP format. That's a ZIP file. CFML can unzip it and inside there's a content.xml file which contains the slides and the notes, so CFML can extract the notes from that format.
Given the CFDOCUMENT functionality, perhaps ColdFusion can even convert the PPT to ODP for you?

There's no way to do this directly in CF. You can do this by dropping to the underlying Java. I stand corrected. Using the showNotes attribute on the <cfpresentation> tag, should add the notes to the HTML.
As an alternative, or if that doesn't work for some reason, you should be able to use Apache POI to do this, although you may need to use a more recent version of poi than shipped with your version of coldfusion, which may require some additional work.
public static LinkedList<String> getNotes(String filePath) {
LinkedList<String> results = new LinkedList<String>();
// read the powerpoint
FileInputStream fis = new FileInputStream(filePath);
SlideShow slideShow = new SlideShow(is);
fis.close();
// get the slides
Slide[] slides = ppt.getSlides();
// loop over the slides
for (Slide slide : slides) {
// get the notes for this slide.
Notes notes = slide.getNotesSheet();
// get the "text runs" that are part of this slide.
TextRun[] textRuns = notes.getTextRuns();
// build a string with the text from all the runs in the slide.
StringBuilder sb = new StringBuilder();
for (TextRun textRun : textRuns) {
sb.append(textRun.getRawText());
}
// add the resulting string to the results.
results.add(sb.toString());
}
return results;
}
Carrying over complex formatting may be a challenge (bulleted lists, bold, italics, links, colors, etc.), as you'll have to dig much deeper into TextRuns, and the related API's and figure how to generate HTML.

CFPRESENTATION (at least as of version 9) does have a showNotes attribute, but you'd still have to parse the output. Depending on the markup of the output, jQuery would make short work of grabbing what you want.

Felt bad that my above answer didn't work out so I dug a little bit. It's a little dated, but it works. PPTUtils, which is based on the apache library that #Antony suggested. I updated this one function to do what you want. You may have to tweak it a bit to do exactly what you want, but I like the fact that this utility returns the data to you in data format rather than in HTML which you'd have to parse.
And just in case, here is the POI API reference I used to find the "getNotes()" function.
<cffunction name="extractText" access="public" returntype="array" output="true" hint="i extract text from a PPT by means of an array of structs containing an array element for each slide in the PowerPoint">
<cfargument name="pathToPPT" required="true" hint="the full path to the powerpoint to convert" />
<cfset var hslf = instance.loader.create("org.apache.poi.hslf.HSLFSlideShow").init(arguments.pathToPPT) />
<cfset var slideshow = instance.loader.create("org.apache.poi.hslf.usermodel.SlideShow").init(hslf) />
<cfset var slides = slideshow.getSlides() />
<cfset var notes = slideshow.getNotes() />
<cfset var retArr = arrayNew(1) />
<cfset var slide = structNew() />
<cfset var i = "" />
<cfset var j = "" />
<cfset var k = "" />
<cfset var thisSlide = "" />
<cfset var thisSlideText = "" />
<cfset var thisSlideRichText = "" />
<cfset var rawText = "" />
<cfset var slideText = "" />
<cfloop from="1" to="#arrayLen(slides)#" index="i">
<cfset slide.slideText = structNew() />
<cfif arrayLen(notes)>
<cfset slide.notes = notes[i].getTextRuns()[1].getRawText() />
<cfelse>
<cfset slide.notes = "" />
</cfif>
<cfset thisSlide = slides[i] />
<cfset slide.slideTitle = thisSlide.getTitle() />
<cfset thisSlideText = thisSlide.getTextRuns() />
<cfset slideText = "" />
<cfloop from="1" to="#arrayLen(thisSlideText)#" index="j">
<cfset thisSlideRichText = thisSlideText[j].getRichTextRuns() />
<cfloop from="1" to="#arrayLen(thisSlideRichText)#" index="k">
<cfset rawText = thisSlideRichText[k].getText() />
<cfset slideText = slideText & rawText />
</cfloop>
</cfloop>
<cfset slide.slideText = duplicate(slideText) />
<cfset arrayAppend(retArr, duplicate(slide)) />
</cfloop>
<cfreturn retArr />
</cffunction>

Related

ColdFusion cffeed/cfoutput

I'm currently using a combination of cffeed and cfoutput to generate an XLM/RSS feed, but am getting some curious output, which manifest differently with different browser settings(I think).
The ColdFusion code that produces the XML is
<cfset RssDetails= StructNew()>
<cfset RssDetails.version = "rss_2.0">
<cfset RssDetails.title = #someTitle#>
<cfset RssDetails.link = "someLink#">
<cfset RssDetails.description = #someDetails#>
<cfset RssDetails.pubDate = now()>
<cfset RssDetails.item = ArrayNew(1)>
<cfloop query="queryResults">
<cfset RssDetails.item[currentRow] = structNew()>
<cfset RssDetails.item[currentRow].title = #someResultTitle#>
<cfset RssDetails.item[currentRow].description = structNew()>
<cfset RssDetails.item[currentRow].description.value = #someResultData#>
<cfset RssDetails.item[currentRow].link = "someResultLink#">
</cfloop>
<cffeed action="create" name="#RssDetails#" overwrite="true" xmlVar="someXML">
<cfoutput>#someXML#</cfoutput>
The basic output looks fine in a browser window, but if I then 'View Source' then there's several lines of 'whitespace' that are before and after the main body of XML. The format of the 'whitespace' when observed in 'View Source' is:
As mentioned above, the erroneous/additional output seems to vary with browser settings, although I've not worked out which ones yet, but ultimately, I'd like to remove the whitespace from the CF-generated XML, rather than rely on browser settings.
I've tried a couple of additional options in the cffeed command, but can't seem to hit a successful outcome...grateful for any thoughts or questions,
Phil

How can I strip this URL of everything before "http://"?

I'm doing some web scraping with ColdFusion and mostly everything is working fine. The only other issues I'm getting is that some URL's come through with text behind them that is now causing errors.
Not sure what's causing it, but it's probably my regex. Anyhow, there's a distinct pattern where text appears before the "http://". I'd like to simply remove everything before it.
Any chance you could help?
Take this string for example:
"I'M OBSESSED WITH MY BEAUTIFUL FRIEND" src="http://scs.viceland.com/feed/images/uk_970014338_300.jpg
I'd much appreciate your help as regex isn't something I've managed to make time for - hopefully I will some day!
Thanks.
EDIT:
Hi,
I thought it might be helpful to post my entire function, since it could be my initial REGEX that is causing the issue. Basically, the funcion takes one argument. In this case, it's the contents of a HTML file (via CFHTTP).
In some cases, every URL looks and works fine. If I try digg.com for example it works...but it dies on something like youtube.com. I guess this would be down to their specific HTML formatting. Either way, all I ever need is the value of the SRC attribute on image tags.
Here's what I have so far:
<cffunction name="extractImages" returntype="array" output="false" access="public" displayname="extractImages">
<cfargument name="fileContent" type="string" />
<cfset var local = {} />
<cfset local.images = [] />
<cfset local.imagePaths = [] />
<cfset local.temp = [] />
<cfset local.images = reMatchNoCase("<img([^>]*[^/]?)>", arguments.fileContent) />
<cfloop array="#local.images#" index="local.i">
<cfset local.temp = reMatchNoCase("(""|')(.*)(gif|jpg|jpeg|png)", local.i) />
<cfset local.path = local.temp />
<cfif not arrayIsEmpty(local.path)>
<cfset local.path = trim(replace(local.temp[1],"""","","all")) />
<cfset arrayAppend(local.imagePaths, local.path) />
</cfif>
<cfif isValid("url", local.path)>
<cftry>
<cfif fileExists(local.path)>
<cfset arrayAppend(local.imagePaths, local.path) />
</cfif>
<cfcatch type="any">
<cfset application.messagesObject.addMessage("error","We were not able to obtain all available images on this page.") />
</cfcatch>
</cftry>
</cfif>
</cfloop>
<cfset local.imagePaths = application.udfObject.removeArrayDuplicates(local.imagePaths) />
<cfreturn local.imagePaths />
</cffunction>
This function WORKS. However, on some sites, not so. It looks a bit over the top but much of it is just certain safeguards to make sure I get valid image paths.
Hope you can help.
Many thanks again.
Michael
Take a look at ReFind() or REFindNoCase() - http://cfquickdocs.com/cf9/#refindnocase
Here is a regex that will work.
<cfset string = 'IM OBSESSED WITH MY BEAUTIFUL FRIEND" src="http://scs.viceland.com/feed/images/uk_970014338_300.jpg' />
<cfdump var="#refindNoCase('https?://[-\w.]+(:\d+)?(/([\w/_.]*)?)?',string, 1, true)#">
You will see a structure returned with a POS and LEN keys. Use the first element in the POS array to see where the match starts, and the first element in the LEN array to see how long it is. You can then use these values in the Mid() function to grab just that matching URL.
I'm not familiar with ColdFusion, but it seems to me that you just need a regex that looks for http://, then any number of characters, then the end of the string.

Populate object with multidimensional menu

I'm wondering if there is an effective way to put a menu into an array or any other data type.
With php I would do something like this:
$menu[1] = "home";
$menu[2] = "news";
$menu[3]["item"] = "products";
$menu[3]["subMenu"][1] = "jackets";
$menu[3]["subMenu"][2] = "T-shirts";
$menu[4] = "contact";
However I have no clue how one would do this in coldfusion.
I want to grab this data from the DB and push it into an object, this will allow me to generate the html from the array.
To take Ciaran's answer a step farther, you can do it completely with object literals in CF 9:
<cfset menu = ["home",
"news",
{"item"="products",
"subMenu"= ["jackets",
"T-shirts"]},
"contact"]>
<cfdump var="#menu#" /> <!--- Output --->
It's actually very similar. This presumes ColdFusion 8 (or higher) for array ([]) and struct ({}) literals:
<cfset menu = [] /> <!--- Create initial array --->
<cfset menu[1] = "home" />
<cfset menu[2] = "news" />
<cfset menu[3] = {} /> <!--- Create structure --->
<cfset menu[3]["item"] = "products" /> <!--- Address structure by key --->
<cfset menu[3]["subMenu"] = [] />
<cfset menu[3]["subMenu"][1] = "jackets" />
<cfset menu[3]["subMenu"][2] = "T-shirts" />
<cfset menu[4] = "contact" />
<cfdump var="#menu#" /> <!--- Output --->
Hope that helps!

Breadcrumbs in Fusebox 4/5

I'm wondering if anyone has come up with a clean way to generate a breadcrumbs trail in Fusebox. Specifically, is there a way of keeping track of "where you are" and having that somehow generate the breadcrumbs for you? So, for example, if you're executing
/index.cfm?fuseaction=Widgets.ViewWidget&widget=1
and the circuit structure is something like /foo/bar/widgets/ then somehow the system automatically creates an array like:
[
{ title: 'Foo', url: '#self#?fuseaction=Foo.Main' },
{ title: 'Bar', url: '#self#?fuseaction=Bar.Main' },
{ title: 'Widgets', url: '#self#?fuseaction=Widgets.Main' },
{ title: 'Awesome Widget', url: '' }
]
Which can then be rendered as
Foo > Bar > Widgets > Awesome Widget
Right now it seems the only way to really do this is to create the structure for each fuseaction in a fuse of some kind (either the display fuse or a fuse dedicated to creating the crumbtrail).
I'm working with Fusebox for a long time, but still can't understand this part:
circuit structure is something like /foo/bar/widgets/
Any way, once my idea was to use the custom lexicon called "parent" (or anything) for each controller fuseaction, where you put the name of previous level fuseaction.
But as I remember, this method was applicable only when using XML-style circuits, where you can always get any fuseaction info from the global container -- so I didn't make it because of intensive use of no-XMl style.
EDIT: example with lexicon
This will work only with Fusebox 5 traditional.
Let's say we have created following lexicon definition /lexicon/bc/parent.cfm:
<cfscript>
if (fb_.verbInfo.executionMode is "start") {
// validate fb_.verbInfo.attributes contents
if (not structKeyExists(fb_.verbInfo.attributes,"value")) {
fb_throw("fusebox.badGrammar.requiredAttributeMissing",
"Required attribute is missing",
"The attribute 'value' is required, for a 'parent' verb in fuseaction #fb_.verbInfo.circuit#.#fb_.verbInfo.fuseaction#.");
}
// compile start tag CFML code
circuit = fb_.verbInfo.action.getCircuit().getName();
fa = fb_.verbInfo.action.getCircuit().getFuseactions();
fa[#fb_.verbInfo.fuseaction#].parent = circuit & "." & fb_.verbInfo.attributes.value;
} else {
// compile end tag CFML code
}
</cfscript>
Basically this is copy-pasted standard lexicon tag specially for lexicon parent.
Assuming we're using Fusebox 5 skeleton example, contoller can look like:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE circuit>
<circuit access="public" xmlns:bc="bc/">
<postfuseaction>
<do action="layout.mainLayout" />
</postfuseaction>
<fuseaction name="welcome" bc:parent="">
<do action="time.getTime" />
<do action="display.sayHello" />
</fuseaction>
<fuseaction name="widgets" bc:parent="app.welcome">
<do action="display.showWidgets" />
</fuseaction>
<fuseaction name="widget" bc:parent="app.widgets">
<do action="display.showWidget" />
</fuseaction>
</circuit>
It shows how the lexicon used for each fuseaction. Please note that if you wont define the attribute bc:parent it wont appear in custom attributes structure later.
It is possible to use only fuseaction name as parent, if you have all chain within same circuit, it can be easier to use later.
Finally, some quick code to build the stuff. Please see comments, they should be helpful enough.
<!--- path data container with current fuseaction saved --->
<cfset arrBreadcrumbs = [] />
<cfset ArrayAppend(arrBreadcrumbs, attributes.fuseaction) />
<!--- pull the current circuit fuseactions --->
<cfset fuseactions = myFusebox.getApplication().circuits[ListFirst(attributes.fuseaction,'.')].getFuseactions() />
<!--- OR <cfset fuseactions = application.fusebox.circuits[ListFirst(attributes.fuseaction,'.')].getFuseactions()> --->
<!--- pull the current fuseaction custom attributes --->
<cfset fa = ListLast(attributes.fuseaction,'.') />
<cfset customAttributes = fuseactions[fa].getCustomAttributes('bc') />
<!--- save the parent fuseaction name if present -- KEY CHECK IS RECOMMENDED --->
<cfif StructKeyExists(customAttributes, "parent")>
<cfset ArrayPrepend(arrBreadcrumbs, customAttributes.parent) />
</cfif>
<!--- let's say we know that parent is there... --->
<!--- pull the found fuseaction custom attributes --->
<cfset fa = ListLast(customAttributes.parent,'.') />
<cfset customAttributes = fuseactions[fa].getCustomAttributes('bc') />
<!--- save the parent fuseaction name if present --->
<cfif StructKeyExists(customAttributes, "parent")>
<cfset ArrayPrepend(arrBreadcrumbs, customAttributes.parent) />
</cfif>
<!--- render the collected path --->
<cfoutput>
<cfloop index="crumb" from="1" to="#ArrayLen(arrBreadcrumbs)#">
<!--- to have a nice labels you can use another lexicon --->
#arrBreadcrumbs[crumb]# <cfif crumb LT ArrayLen(arrBreadcrumbs)>></cfif>
</cfloop>
</cfoutput>
So the output should look like this: app.welcome > app.widgets > app.widget
Here's something I use...
act_breadcrum.cfm
=============================
<cfscript>
if (NOT structKeyExists(session, 'clickstream'))
{
session.clickstream = arrayNew(1);
}
</cfscript>
<cflock name="addNewPage" type="exclusive" timeout="10">
<cfscript>
if ((arrayIsEmpty(session.clickstream))
OR (compare(myFusebox.originalCircuit, session.clickstream[arrayLen(session.clickstream)].Circuit))
OR (compare(myFusebox.originalFuseaction, session.clickstream[arrayLen(session.clickstream)].Fuseaction))
)
{
if (arrayLen(session.clickstream) EQ 10)
{
temp = arrayDeleteAt(session.clickstream, 1);
}
temp = arrayAppend(session.clickstream, structNew());
session.clickstream[arrayLen(session.clickstream)].Fuseaction = myFusebox.originalFuseaction;
session.clickstream[arrayLen(session.clickstream)].Circuit = myFusebox.originalCircuit;
}
</cfscript>
</cflock>
dsp_Breadcrum.cfm
==========================
<cfoutput>
<center>
<b><u>Last Clicked</u></b><BR>
<cfloop from="#arrayLen(session.clickstream)#" to="1" index="i" step="-1">
<cfset Opaque=i*.2>
<a href="#Myself##session.clickstream[i].Circuit#.#session.clickstream[i].Fuseaction#" style=opacity:#Opaque#>
#session.clickstream[i].Circuit#
</a><BR>
</cfloop>
</center>
</cfoutput>

Resolving variables inside a Coldfusion string

My client has a database table of email bodies that get sent at certain times to customers. The text for the emails contains ColdFusion expressions like Dear #firstName# and so on. These emails are HTML - they also contain all sorts of HTML mark-up. What I'd like to do is read that text from the database into a string and then have ColdFusion Evaluate() that string to resolve the variables. When I do that, Evaluate() throws an exception because it doesn't like the HTML markup in there (I also tried filtering the string through HTMLEditFormat() as an intermediate step for grins but it didn't like the entities in there).
My predecessor solved this problem by writing the email text out to a file and then cfincluding that. It works. It's seems really hacky though. Is there a more elegant way to handle this using something like Evaluate that I'm not seeing?
What other languages often do that seems to work very well is just have some kind of token within your template that can be easily replaced by a regular expression. So you might have a template like:
Dear {{name}}, Thanks for trying {{product_name}}. Etc...
And then you can simply:
<cfset str = ReplaceNoCase(str, "{{name}}", name, "ALL") />
And when you want to get fancier you could just write a method to wrap this:
<cffunction name="fillInTemplate" access="public" returntype="string" output="false">
<cfargument name="map" type="struct" required="true" />
<cfargument name="template" type="string" required="true" />
<cfset var str = arguments.template />
<cfset var k = "" />
<cfloop list="#StructKeyList(arguments.map)#" index="k">
<cfset str = ReplaceNoCase(str, "{{#k#}}", arguments.map[k], "ALL") />
</cfloop>
<cfreturn str />
</cffunction>
And use it like so:
<cfset map = { name : "John", product : "SpecialWidget" } />
<cfset filledInTemplate = fillInTemplate(map, someTemplate) />
Not sure you need rereplace, you could brute force it with a simple replace if you don't have too many fields to merge
How about something like this (not tested)
<cfset var BaseTemplate = "... lots of html with embedded tokens">
<cfloop (on whatever)>
<cfset LoopTemplate = replace(BaseTemplate, "#firstName#", myvarforFirstName, "All">
<cfset LoopTemplate = replace(LoopTemplate, "#lastName#", myvarforLastName, "All">
<cfset LoopTemplate = replace(LoopTemplate, "#address#", myvarforAddress, "All">
</cfloop>
Just treat the html block as a simple string.
CF 7+: You may use regular expression, REReplace()?
CF 9: use Virtual File System
If the variable is in a structure from, something like a form post, then you can use "StructFind". It does exactly as you request. I ran into this issue when processing a form with dynamic inputs.
Ex.
StructFind(FORM, 'WhatYouNeed')