I'm using <cfhttp> to pull in content from another site (coldfusion) and resolveurl="true" so all the links work. The problem I'm having is resolveurl is making the anchor links (href="#search") absolute links as well breaking them. My question is is there a way to make resolveurl="true" bypass anchor links somehow?
For starters, let's use the tutorial code from Adobe.com posted in the comments. You'll want to do something similar.
<cfhttp url="https://www.adobe.com"
method="get" result="httpResp" timeout="120">
<cfhttpparam type="header" name="Content-Type" value="application/json" />
</cfhttp>
<cfscript>
// Find all the URLs in a web page retrieved via cfhttp
// The search is case sensitive
result = REMatch("https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?", httpResp.Filecontent);
</cfscript>
<!-- Now, Loop through those URLs--->
<cfoutput>
<cfloop array="#result#" item="item" index="index">
<cfif LEFT(item, 1) is "##">
<!---Your logic if it's just an anchor--->
<cfelse>
<!---Your logic if it's a full link--->
</cfif>
<br/>
</cfloop>
</cfoutput>
If it tries to return a full URL before the anchor as you say, (I've been getting inconsistent results with resolveurl="true") hit it with this to only grab the bit you want.
<cfoutput>
<cfloop array="#result#" item="item" index="index">
#ListLast(item, "##")#
</cfloop>
</cfoutput>
What this code does is grab all the URLs, and parse them for anchors.
You'll have to decide what to do next inside your loop. Maybe preserve the values and add them to a new array, so you can save it somewhere with the links fixed?
It's impossible to assume in a situation like this.
There does not appear to be a way to prevent CF from resolving the hashes. In our usage of it the current result is actually beneficial since when we present content from another site we usually want the user to be sent there.
Here is a way to replace link href values with just anchor if one is present using regular expressions. I'm sure there are combinations of issues that could occur here if really malformed html.
<cfsavecontent variable="testcontent">
<strong>test</strong>
go to google
go to section
</cfsavecontent>
<cfset domain = replace("current.domain", ".", "\.", "all") />
<cfset match = "(href\s*=\s*(""|'))\s*(http://#domain#[^##'""]+)(##[^##'""]+)\s*(""|')" />
<cfset result = reReplaceNoCase(testcontent, match, "\1\4\6", "all") />
<cfoutput><pre>#encodeForHTML(result)#</pre></cfoutput>
Output
<strong>test</strong>
go to google
<a href="#section>go to section</a>
Another option if you are displaying the content in a normal page with js/jquery available is to run through each link on display and update it to just be the anchor. This will be less likely error with malformed html. Let me know if you have any interest in that approach.
Related
I am using CF10. I have a Select:
<cfselect name="company" id="company" query="qcompany" display="placename" value="placeid" queryposition="below">
<option value="0">--Select--
</cfselect>
I have another cfselect that is bound to the first:
<cfselect name="People" bind="cfc:schedule.GetPeopleArray({company})" ></cfselect>
I cannot get the second cfselect to display any results. To test whether or not I am receiving data from my component (which I will display at the bottom), I bound a text box:
<cfinput name="test" bind="cfc:schedule.GetPeopleArray({company})" bindonload="false"/>
This text box is displaying the results of the call to my component every time, but the cfselect never displays anything.
What could I possibly be doing wrong?
I have tried returning arrays and queries from my component. No help. I have tried adding display and value attributes to the second cfselect. No help.
Here is my component:
<cfcomponent output="false">
<cffunction name="GetPeopleArray" access="remote" returnType="array" output="false">
<cfargument name="company" type="string" >
<!--- Define variables --->
<cfset var data="">
<cfset var result=ArrayNew(2)>
<cfset var i=0>
<cfquery name="qEmployee" datasource="texas" >
SELECT 0 as personid,'Select Person' as fullname,0 as sortorder
UNION
SELECT p.personid ,concat(firstname,' ',lastname) as fullname,3 as sortorder
FROM person p
INNER JOIN placeperson pp
ON p.personid=pp.personid
where personstatus='ACT'
and pp.placeid=#arguments.company#
order by sortorder,fullname
</cfquery>
<!--- Convert results to array --->
<cfloop index="i" from="1" to="#qEmployee.RecordCount#">
<cfset result[i][1]=qEmployee.personid[i]>
<cfset result[i][2]=qEmployee.fullname[i]>
</cfloop>
<!--- And return it --->
<cfreturn result>
</cffunction>
</cfcomponent>
Ultimately, you may want use jQuery anyway, but FWIW your existing code worked fine with CF10. (The only change was removing the JOIN for simplicity) So either you are using different code, or there is something else going on we are unaware of ..
Truthfully the Ajax functionality does have some "quirks". However, you should not have any problem with a simple case like this. Aside from adding a text field, what other debugging or troubleshooting steps did you perform? What I usually recommend is:
Test the CFC independently first. Access it directly in your browser with a variety of sample values:
http://localhost/path/to/schedule.cfc?method=GetPeopleArray&company=someValue
I did this with the original code and discovered an error occurred when the company value is not numeric, like an empty string. (I suspect that might have been the problem) You can prevent that error by substituting an invalid ID like 0 instead. Note, be sure to use cfqueryparam to prevent sql injection.
AND pp.placeid = <cfqueryparam value="#val(arguments.company)#"
cfsqltype="cf_sql_integer">
Enable the CF AJAX debugger in the CF Administrator. Then append ?cfdebug to your test script so you can view the console and check for problems/errors.
http://localhost/path/to/yourTestForm.cfm?cfdebug
Again, I did this after tweaking the query. But there were no errors. Your existing cfform code worked perfectly.
Usually those two steps are enough to pinpoint any problems. If not, make sure your Application.cfc file (if you are using one) is not interfering with the Ajax request. That is a common gotcha. Test the code in a separate directory that is outside any Application files.
EDIT: Also, you may as well set bindonload="false" for the select list too. Since you do not want to call the function when the page first loads. Only when the user selects something from the list.
So, I'm using jsoup and when I display the content returned I Get:
{{#ifcond="" pagetitle="" this.name}}
I'm doing this like local.htmlObj.Body().Html()
When I need it to be return like:
{{#ifCond PAGETITLE this.NAME}}
Here what I'm doing
<cfset paths = [] />
<cfset paths[1] = expandPath("/javaloader/lib/jsoup-1.7.1.jar") />
<cfset loader = createObject("component", "javaloader.JavaLoader").init( paths ) />
<cfset obj = loader.create( "org.jsoup.Jsoup" ) />
<cfset local.htmlObj = local.jsoupObj.parse( local.template ) />
<cfloop array="#local.htmlObj.select('.sidebar_left')#" index="element">
<cfif element.attr('section') EQ "test">
<cfset element.append('HTML HERE') />
</cfif>
</cfloop>
local.template is my template that is made up of a ton of different handlebar files That im pulling for different places. I'm constructing one handlebar file that gets returned.
The problem is that JSoup is trying to parse invalid HTML before it lets you access it. A slight easier to understand example of this behavior can be seen if you fetch the following HTML (seen in this question):
<p>
<table>[...]</table>
</p>
It will return:
<p></p>
<table>[...]</table>
In your case the Handelbars code is seen as attributes which in valid html always have a value (think checked="checked"). As far as I can tell there is no way to disable this behavior. It's really the wrong tool for the job you are trying to do. A cleaner approach would be to just fetch the document as a stream and save it to a string.
I need to replace the text inside all href values. I think a regular expression is the way to do it, but I'm no regex pro. Any thoughts on how I'd do the following using ColdFusion?
so it is changed to:
Thanks!
Here's an update to the question: I have this code and need the pattern below:
<cfset matches = ReMatch('<a[^>]*href="http[^"]*"[^>]*>(.+?)</a>', arguments.htmlCode) /> <cfdump var="#matches#">
<cfset links = arrayNew(1)>
<cfloop index="a" array="#matches#">
<cfset arrayAppend(links, rereplace(a, 'need regex'," {clickurl}","all"))>
</cfloop>
<cfdump var="#links#">
Here's how to do it with jSoup HTML parser:
<cfset jsoup = createObject('java','org.jsoup.Jsoup') />
<cfset Dom = jsoup.parse( InputHtml ) />
<cfset Dom.select('a[href]').attr('href','{replaced}') />
<cfset NewHtml = Dom.html() />
(On CF9 and earlier, this requires placing the jsoup's jar in CF's lib directory, or using JavaLoader.)
Using a HTML parser is usually better than using regex, not least because it's easier to maintain and understand.
Here's an imperfect way of doing it with a regex:
<cfset NewHtml = InputHtml.replaceAll
( '(?<=<a.{0,99}?\shref\s{0,99}?=\s{0,99}?)(?:"[^"]+|''[^'']+)(["'])'
, '$1{replaced}$1'
)/>
Which hopefully demonstrates why using a tool such as jsoup is definitely the way to go...
(btw, the above is using the Java regex engine (via string.replaceAll), so it can use the lookbehind functionality, which doesn't exist in CF's built-in regex (rereplace/rematch/etc))
Update, based on the new code sample you've provided...
Here is an example of how to use jsoup for what you're doing - it might still need some updates (depending on what {clickurl} is eventually going to be doing), but it currently functions the same as your sample code is attempting:
<cfset jsoup = createObject('java','org.jsoup.Jsoup') />
<cfset links = jsoup.parse( Arguments.HtmlCode )
<!--- select all links beginning http and change their href --->
.select('a[href^=http]').attr('href',' {clickurl}')
<!--- get HTML for all links, then split into array. --->
.outerHtml().split('(?<=</a>)(?!$)')
/>
<cfdump var=#links# />
That middle bit is all a single cfset, but I split it up and added comments for clarity. (You could of course do this with multiple variables and 3+ cfsets if you preferred that.)
Again, it's not a regex, because what you're doing involves parsing HTML, and regex is not designed for parsing tag-based syntax, so isn't very good at it - there are too many quirks and variations with HTML and describing them in a single regex gets very complicated very quickly.
So the announcement functionality on our site displays a "read more" link by default using the following code (in part):
<cfif announcement.recordCount gt 0>
<cfloop query="announcement">
<cfoutput>
<td colspan="2"><span class="left">#teaser_text# Read more »
</cfoutput>
</cfloop>
(Note, there is a cfquery statement prior to that, which I excluded for brevity in the code)
What I'm trying to do here is get the "Read More" link to show after the #teaser_text# only if no link is contained within #teaser_text#, so that I can manually add links in if needed and remove the automatically generated link.
Any thoughts on a cfif statement that would do this?
Thanks.
EDIT: To clarify, I want to remove "Read more" if ANY link is found within teaser_text.
To only show the read more link if no hyperlink is found within teaser_text, this check is likely to be good enough:
<cfif NOT refindNoCase('<a\s[^>]*?\bhref\s*=',teaser_text) >
Read more »
</cfif>
If you want to check for URLs, not for hyperlinks, you need to get more fancy.
You also need to remember that this is treating teaser_text as text (not as HTML), so commenting out a link will not prevent it from being found (if that matters, you need to investigate HTML DOM parsers; and there aren't any for CF so you'd need to look at the Java ones).
This should work:
<cfif findnocase('http://', teaser_text) eq 0>
Read more »
</cfif>
If you are placing the links in manually just change the first parameter of the findnocase() function [i.e. htp/https] or use a regex to figure out if it is a url [via: refindnocse() ]
-sean
Something like this should do what you want
<cfif announcement.recordCount gt 0>
<cfloop query="announcement">
<cfif findnocase("href",anouncement.teaser_text) >
#anouncement.teaser_text#
<cfelse>
<a href="/announcements/?id=#announcement.id#" > #anouncement.teaser_text# Read more </a>
</cfif>
</cfloop>
</cfif>
I'm doing some web scraping with ColdFusion and mostly everything is working fine. The only other issues I'm getting is that some URL's come through with text behind them that is now causing errors.
Not sure what's causing it, but it's probably my regex. Anyhow, there's a distinct pattern where text appears before the "http://". I'd like to simply remove everything before it.
Any chance you could help?
Take this string for example:
"I'M OBSESSED WITH MY BEAUTIFUL FRIEND" src="http://scs.viceland.com/feed/images/uk_970014338_300.jpg
I'd much appreciate your help as regex isn't something I've managed to make time for - hopefully I will some day!
Thanks.
EDIT:
Hi,
I thought it might be helpful to post my entire function, since it could be my initial REGEX that is causing the issue. Basically, the funcion takes one argument. In this case, it's the contents of a HTML file (via CFHTTP).
In some cases, every URL looks and works fine. If I try digg.com for example it works...but it dies on something like youtube.com. I guess this would be down to their specific HTML formatting. Either way, all I ever need is the value of the SRC attribute on image tags.
Here's what I have so far:
<cffunction name="extractImages" returntype="array" output="false" access="public" displayname="extractImages">
<cfargument name="fileContent" type="string" />
<cfset var local = {} />
<cfset local.images = [] />
<cfset local.imagePaths = [] />
<cfset local.temp = [] />
<cfset local.images = reMatchNoCase("<img([^>]*[^/]?)>", arguments.fileContent) />
<cfloop array="#local.images#" index="local.i">
<cfset local.temp = reMatchNoCase("(""|')(.*)(gif|jpg|jpeg|png)", local.i) />
<cfset local.path = local.temp />
<cfif not arrayIsEmpty(local.path)>
<cfset local.path = trim(replace(local.temp[1],"""","","all")) />
<cfset arrayAppend(local.imagePaths, local.path) />
</cfif>
<cfif isValid("url", local.path)>
<cftry>
<cfif fileExists(local.path)>
<cfset arrayAppend(local.imagePaths, local.path) />
</cfif>
<cfcatch type="any">
<cfset application.messagesObject.addMessage("error","We were not able to obtain all available images on this page.") />
</cfcatch>
</cftry>
</cfif>
</cfloop>
<cfset local.imagePaths = application.udfObject.removeArrayDuplicates(local.imagePaths) />
<cfreturn local.imagePaths />
</cffunction>
This function WORKS. However, on some sites, not so. It looks a bit over the top but much of it is just certain safeguards to make sure I get valid image paths.
Hope you can help.
Many thanks again.
Michael
Take a look at ReFind() or REFindNoCase() - http://cfquickdocs.com/cf9/#refindnocase
Here is a regex that will work.
<cfset string = 'IM OBSESSED WITH MY BEAUTIFUL FRIEND" src="http://scs.viceland.com/feed/images/uk_970014338_300.jpg' />
<cfdump var="#refindNoCase('https?://[-\w.]+(:\d+)?(/([\w/_.]*)?)?',string, 1, true)#">
You will see a structure returned with a POS and LEN keys. Use the first element in the POS array to see where the match starts, and the first element in the LEN array to see how long it is. You can then use these values in the Mid() function to grab just that matching URL.
I'm not familiar with ColdFusion, but it seems to me that you just need a regex that looks for http://, then any number of characters, then the end of the string.