Error using cfhttp to retrieve page contents from bitly url - coldfusion

I am using cfhttp (Lucee Server) to scrape page contents from a url in the following manner:
<cfhttp url="#libs.originalAdPage#" method="GET" />
I then place this content in a div on my page.
This code has been working for a long time.
I have a need to report on the url's that have been scraped for their content and that information is placed into another website form that is not in my control. I decided to convert the url's to shortened bitly url's. I built the process into the page to create a bitly link and return that url to replace the existing url.
If i use the page with a shortened url from linkedin the page is scraped and displayed correctly in the div.
<cfhttp url="http://bit.ly/1NPhPgc" method="GET" />
But if I do an identical cfhttp call to a Indeed.com page shorted to a bitly URL I get a connection failure error.
<cfhttp url="http://bit.ly/1RQvlim" method="GET" />[![cfdump of connection failure][1]][1]
If I open this URL directly in the browser the page is displayed correctly.
Any ideas would be greatly appreciated.
Thanks,
Michael

I don't have access to a Lucee server to test with, however cfhttp on a ColdFusion server works fine for me for both of those bitly URLs. cfhttp follows the redirect and the FileContent contains the indeed.com page as would be expected.
Have you verified what happens with the Bitly Indeed URL if you prevent cfhttp from automatically following the redirects so that you can debug and follow the redirects manually? ie
<cftry>
<cfhttp url="http://bit.ly/1RQvlim" method="GET" redirect="no" />
<cfdump var="#cfhttp.responseHeader#" />
<cfhttp url="#cfhttp.responseHeader.Location#" method="GET" />
<cfdump var="#cfhttp#" label="cfhttp2" />
<cfcatch>
<cfdump var="#cfcatch#" label="cfcatch" />
</cfcatch>
</cftry>
Indeed.com do pay attention to crawlers and user agents - just see their robots.txt for evidence of this.
Do you have access to a different server to test with in case there is something specific to Lucee's cfhttp implementation or to your IP address (eg blacklisted due to all the scraping)?
Have you tried tweaking the cfhttp useragent and/or any other headers as per How to emulate a real http request via cfhttp?

Related

CFHTTP POST, result is image, how to save

I found a web-site that clears exif data from an image. The source can either be an uploaded picture or a URL. I thought, perhaps, I could use this with CFHTTP to do this automatically for pictures I post to my web-site. I know I can probably run my images manually through this site before I upload them to my site. Call this an exercise if you want.
Here is the code I am using, which basically matches the form source on this very simple web-site (link)
<cfhttp method="POST" url="https://www.verexif.com/en/quitar.php" result="result" >
<cfhttpparam name="foto_url" type="formfield" value="{myimageurl}">
</cfhttp>
When I CFDUMP the result, I get the following:
When I try to use DeserializeJSON(result.Filecontent), it gives me a ColdFusion error:
When I url-encode my original URL in the CFHTTP tag, the result.filecontent contains the source code of the original web-site.
As can be seen in the first image above, there is a file called 'foto_no_exif.jpg' included in the output. This is the file I need to download. How can I do this ?
In your current dump, you have the modified image, but you need to get to accesses it as binary data. You can force the file content of the request to be treated as binary data by adding the attribute getasbinary to your cfhttp tag.
Working example:
<cfset imageURL ='https://raw.githubusercontent.com/ianare/exif-samples/master/jpg/long_description.jpg'/>
<cfhttp method="get" getasbinary="yes" charset="utf-8" url="https://www.verexif.com/en/quitar.php" result="result">
<cfhttpparam name="foto_url" type="formfield" value="#imageURL#">
</cfhttp>
<cfcontent variable="#result.Filecontent#" type="image/jpg" reset="true" />
Run it on TryCF.com

connecting to smarter stats through cfhttp

I am trying to get connected to smarter stats website by by passing the login window and load the statistics in a fancybox page
so far this is my code: but that does not seems to be working
<cfhttp method="post" url="https://stats.ezhostingserver.com/" resolveurl="true" redirect="true">
<cfhttpparam type="FORMFIELD" name="ctl00$MPH$txtUserName" value="test.ca">
<cfhttpparam type="FORMFIELD" name="ctl00$MPH$txtPassword" value="mypwd!">
<cfhttpparam type="FORMFIELD" name="ctl00$MPH$txtSiteId" value="12343">
</cfhttp>
<cfif cfhttp.statuscode EQ '200 OK'>
<cfhttp result="results" url="https://stats.ezhostingserver.com/default.aspx"/>
<cfoutput>
#results.filecontent#
</cfoutput>
</cfif>
problem is every time i load the page
http://domain.in/index.cfm
it comes back to
http://domain.in/stats/Login.aspx
I am using hostek website's stats provide for a domain
The reason your code is behaving this way is because the initial URL you have in your cfhttp tag is returning an HTTP 302 redirect. Then because you have the redirect attribute of the cfhttp tag set to true it is actually performing the redirect. Look at the documentation for that attribute:
redirect - If the response header includes a Location field AND ColdFusion receives a 300-series (redirection) status code, specifies whether to redirect execution to the URL specified in the field:
yes: redirects execution to the specified page.
no: stops execution and returns the response information in the cfhttp variable, or throws an error if the throwOnError attribute is True.
The cfhttp.responseHeader.Location variable contains the redirection path. ColdFusion follows a maximum of four redirects on a request. If there are more, ColdFusion functions as if redirect = "no".
Note: The cflocation tag generates an HTTP 302 response with the url attribute as the Location header value.
So instead of using that initial URL for your cfhttp request, try using the URL it is redirecting to. And set the redirect attribute to false. But be aware that having that attribute set to false the tag will throw an error if it gets a redirect status code so you will need to handle that.
Example:
<cfhttp method="post"
url="https://stats.ezhostingserver.com/Login.aspx"
resolveurl="true"
redirect="false">

Coldfusion cflocation strange behavior

I am storing a google review URL in my database as:
https://www.google.com/search?CFID=ac59cfdf-bbad-4017-9759-e88054f3f242&CFTOKEN=0&q=njcomputerrepair%2Bbrick%2Bnj&oq=njcomp&aqs=chrome.1.69i60j69i59j69i60j69i57j0l2.2762j0j9&sourceid=chrome&ie=UTF-8#lrd=0x89c18348735c2907:0x59aa614832a36b22,3,
And then in my application I set that URL to a variable and I redirect the user to that URL using cflocation.
<cfquery name="geturl" datasource="#datasource#">
select (residential_ReviewURL) as redirectURL
from subscribers
</cfquery>
<!--- Redirect to main html redirect page --->
<cfoutput>
<cflocation url="#getURL.redirectURL#">
</cfoutput>
However the URL gets changed at some point because I think that Coldfusion doesn't like the characters in the URL and it replaces them with % or removes them. Therefore when the user hits the google page, the page doesn't process as it should.
Here is how the URL looks after the redirect:
https://www.google.com/search?CFID=ac59cfdf-bbad-4017-9759-e88054f3f242&CFTOKEN=0&CFID=ac59cfdf-bbad-4017-9759-e88054f3f242&CFTOKEN=0&q=njcomputerrepair%2Bbrick%2Bnj&oq=njcomp&aqs=chrome.1.69i60j69i59j69i60j69i57j0l2.2762j0j9&sourceid=chrome&ie=UTF-8#lrd%3D0x89c18348735c2907%3A0x59aa614832a36b22%2C3%2C
How can I stop ColdFusion from changing the URL and keep id exactly as how it is stored in the database?
UPDATE
So I found that URLdecode will preserve the string. Here is what I have.
#urlDecode(getURL.redirectURL)#
The output is as follows
https://www.google.com/search?CFID=ac59cfdf-bbad-4017-9759-e88054f3f242&CFTOKEN=0&q=njcomputerrepair+brick+nj&oq=njcomp&aqs=chrome.1.69i60j69i59j69i60j69i57j0l2.2762j0j9&sourceid=chrome&ie=UTF-8#lrd=0x89c18348735c2907:0x59aa614832a36b22,3,
Why is it adding CFID and CFTOKEN to the URL though? I have it turned off in my Application.CFM:
<cfapplication name="yaya"
clientmanagement="no"
sessionmanagement="no"
setclientcookies="no"
setdomaincookies="no"
sessiontimeout="#CreateTimeSpan(0,2,0,0)#"
applicationtimeout="#CreateTimeSpan(1,0,0,0)#"
>
To help others coming here:
cflocation have a parameter addToken which needs to set to no if we do not want to add CFID and CFTOKEN to the generated URL.
Adobe CFML reference: https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-tags/tags-j-l/cflocation.html

grabbing JSON data using coldfusion

I have a URL which when run in the browser, displays JSON data, since I am new to coldfusion, I am wondering, what would be a good way to
grab the data from the web browser? Later on I will be storing the individial JSON data into MySQL database, but I need to figure out step 1
which is grabbing the data.
Please advise.
Thanks
You'll want to do a cfhttp request to load the external content.
Then you can use deserializeJSON to convert the JSON object into the appropriate cfml struct.
See the example Adobe gives in the deserializeJSON documentation.
Here is quick example:
<!--- Set the URL address. --->
<cfset urlAddress="http://ip.jsontest.com/">
<!--- Generate http request from cf --->
<cfhttp url="#urlAddress#" method="GET" resolveurl="Yes" throwOnError="Yes"/>
<!--- handle the response from the server --->
<cfoutput>
This is just a string:<br />
#CFHTTP.FileContent#<br />
</cfoutput>
<cfset cfData=DeserializeJSON(CFHTTP.FileContent)>
This is object:<br />
<cfdump var="#cfData#">
Now you can do something like this:<br />
<cfoutput>#cfData.ip#</cfoutput>
Execute this source here http://cflive.net/

CFWebsocket Cross Domain?

Does anyone know if the new websockets feature in CF10 can be used cross domain and cross server? And does anyone know or have some sample code to do this?
I have a simple live help chat working on my app but I want to apply it to other sites and have one central admin chat area where the support agents will interact with users cross domain.
As far as I know they do not. You can, however, use a <cfhttp> to call a file on the other site that will publish the message. Here is I accomplished this.
Create a file called socketPublisher.cfm and save it in a directory that does not require a login access a file.
socketPublisher.cfm
<cfparam name="Request.Attributes.msgType" default="newJob">
<cfparam name="Request.Attributes.channel" default="notify">
<cfparam name="Request.Attributes.Type" default="">
<cfoutput>
<cfswitch expression="#Request.Attributes.Type#">
<cfcase value="yourType">
<cfscript>
WSPublish('chat',{message: '', msgType: '#Request.Attributes.msgType#'});
</cfscript>
</cfcase>
<cfdefaultcase>
<cfscript>
WSPublish('#Request.Attributes.channel#',{message: '', msgType: '#Request.Attributes.msgType#'});
</cfscript>
</cfdefaultcase>
</cfswitch>
</cfoutput>
Then in you action page on the other site, you will need to make your http request to that file.
actionPage.cfm
<cfhttp method="Post" url="#socketURL#/_scripts/socketPublisher.cfm">
<cfhttpparam type="URL" name="msgType" value="pendingFiles">
</cfhttp>
That should do it.
There is also a know issue with CF10 WSPublish that it will change the CGI scope cause error when trying to do a redirect from an action page. I am using this as a workaround for that issue until I can find a better solution.