CFhttp to Scrape Image - coldfusion

A contractor has provided us with survey data for a set of stores. The data contains the store numbers, thumbnail images and large images. The data is accessed through the contractor's secured website. In order to build a report for the data, I am trying to scrape the store numbers and images from the site instead of manually downloading each image.
I have not used CFhttp for secured sites, but have had a little success so far with:
<cfhttp
method="post"
url="http://www.website.com/impart/client_login.php"
throwonerror="Yes"
redirect = "yes"
resolveUrl = "yes">
<cfhttpparam name="user" value="myUsername" type="formfield">
<cfhttpparam name="pass" value="myPassword" type="formfield">
<cfhttpparam name="submit" value="Login" type="formfield">
How do I proceed from getting passed the authentication to the page that contains the image to download?

I think that CFHTTP may not be the best choice for this. I am good at BASH, so I would tend towards scripting it with curl, but maybe some product on this page would be easier http://www.timedicer.co.uk/web-scraping ?

What does the dump of cfhttp scope look like? Specifically, what is the status code?
If you get a status code of 200, you'll need to maintain the session as you grab each image. See the following:
http://www.bennadel.com/blog/725-Maintaining-Sessions-Across-Multiple-ColdFusion-CFHttp-Requests.htm
http://www.bennadel.com/projects/cfhttp-session.htm
See this question for saving images via CFHTTP:
Convert an image from CFHTTP filecontent to binary data with Coldfusion

Related

CFHTTP POST, result is image, how to save

I found a web-site that clears exif data from an image. The source can either be an uploaded picture or a URL. I thought, perhaps, I could use this with CFHTTP to do this automatically for pictures I post to my web-site. I know I can probably run my images manually through this site before I upload them to my site. Call this an exercise if you want.
Here is the code I am using, which basically matches the form source on this very simple web-site (link)
<cfhttp method="POST" url="https://www.verexif.com/en/quitar.php" result="result" >
<cfhttpparam name="foto_url" type="formfield" value="{myimageurl}">
</cfhttp>
When I CFDUMP the result, I get the following:
When I try to use DeserializeJSON(result.Filecontent), it gives me a ColdFusion error:
When I url-encode my original URL in the CFHTTP tag, the result.filecontent contains the source code of the original web-site.
As can be seen in the first image above, there is a file called 'foto_no_exif.jpg' included in the output. This is the file I need to download. How can I do this ?
In your current dump, you have the modified image, but you need to get to accesses it as binary data. You can force the file content of the request to be treated as binary data by adding the attribute getasbinary to your cfhttp tag.
Working example:
<cfset imageURL ='https://raw.githubusercontent.com/ianare/exif-samples/master/jpg/long_description.jpg'/>
<cfhttp method="get" getasbinary="yes" charset="utf-8" url="https://www.verexif.com/en/quitar.php" result="result">
<cfhttpparam name="foto_url" type="formfield" value="#imageURL#">
</cfhttp>
<cfcontent variable="#result.Filecontent#" type="image/jpg" reset="true" />
Run it on TryCF.com

Error 400: Bad request while fetching json data from instagram api via coldfusion

Afraid to ask this question as I'm not able to create a jsfiddle for it, but hope someone will be able help.
I'm trying to create a cfc in ColdFusion for an instagram login. That part is done. I'm using postman (google app) and by my credentials I can see the user's data in json, but when I'm converting this into ColdFusion it's giving an error. I tried to change the data-type, header and a lot of lines but am still getting the same error again and again.
My code (replaced ids with xxx for security)
<cftry>
<cfhttp url="https://api.instagram.com/oauth/access_token" method="post" resolveurl="yes">
<cfhttpparam type="header" name="Content-Type" value="application/x-www-form-urlencoded" />
<cfhttpparam type="formfield" name="client_id" value="14faxxxxxdc5440f86x6cdd8xxxxf78" />
<cfhttpparam type="formField" name="client_secret" value="40xa78220cfb" />
<cfhttpparam type="formField" name="grant_type" value="authorization_code" />
<cfhttpparam type="formField" name="redirect_uri" value="#URLEncodedFormat('http://example.com/demo/instagramAPI/success.cfm')#" />
<cfhttpparam type="formField" name="code" value="#url.code#" />
</cfhttp>
<cfdump var="#cfhttp#"><cfabort>
<cfcatch type="any">
<cfdump var="#cfcatch#">
</cfcatch>
</cftry>
I'm following this code from this answer. For more info check this . You can see that I'm getting data by using the same login details, but when doing the same via a cfc I'm getting error.
Error which I'm getting after running the url :
I've read a lot of articles and blogs, but still haven't been able to resolve the error. Can anyone help me understand what I'm doing wrong? If you have any other suggestions, please do let me know.
If any additional information is required, just let me know.
Finally i got the answer of my question after days.. Thanx Miguel-F and Mark A Kruger however Mark your link wasn't good for me as that was out of my issue so..
What i did is to update my SSL certificate. I tried before but was not having much information like Organization unit etc but then i followed the steps given in this Link, provided by Miguel and tested then i got expiration code error.
After that i tried to refresh with ?reinit=1 as i made changes in my cfc but forget to reinitialize after updating the certificate, and then i got the result :)
SO final answer is Update your SSL certificate with proper authorization and cfc can fetch data from Instagram..
Link useful for me One, CFC demo. In cfc demo you can download the cfc for instagram which is also useful (even in case you don't have to update the SSL certificate).
If anyone having issue with Instagram cfc then do let me know.. I spent days on this and can help you.. :)

Error using cfhttp to retrieve page contents from bitly url

I am using cfhttp (Lucee Server) to scrape page contents from a url in the following manner:
<cfhttp url="#libs.originalAdPage#" method="GET" />
I then place this content in a div on my page.
This code has been working for a long time.
I have a need to report on the url's that have been scraped for their content and that information is placed into another website form that is not in my control. I decided to convert the url's to shortened bitly url's. I built the process into the page to create a bitly link and return that url to replace the existing url.
If i use the page with a shortened url from linkedin the page is scraped and displayed correctly in the div.
<cfhttp url="http://bit.ly/1NPhPgc" method="GET" />
But if I do an identical cfhttp call to a Indeed.com page shorted to a bitly URL I get a connection failure error.
<cfhttp url="http://bit.ly/1RQvlim" method="GET" />[![cfdump of connection failure][1]][1]
If I open this URL directly in the browser the page is displayed correctly.
Any ideas would be greatly appreciated.
Thanks,
Michael
I don't have access to a Lucee server to test with, however cfhttp on a ColdFusion server works fine for me for both of those bitly URLs. cfhttp follows the redirect and the FileContent contains the indeed.com page as would be expected.
Have you verified what happens with the Bitly Indeed URL if you prevent cfhttp from automatically following the redirects so that you can debug and follow the redirects manually? ie
<cftry>
<cfhttp url="http://bit.ly/1RQvlim" method="GET" redirect="no" />
<cfdump var="#cfhttp.responseHeader#" />
<cfhttp url="#cfhttp.responseHeader.Location#" method="GET" />
<cfdump var="#cfhttp#" label="cfhttp2" />
<cfcatch>
<cfdump var="#cfcatch#" label="cfcatch" />
</cfcatch>
</cftry>
Indeed.com do pay attention to crawlers and user agents - just see their robots.txt for evidence of this.
Do you have access to a different server to test with in case there is something specific to Lucee's cfhttp implementation or to your IP address (eg blacklisted due to all the scraping)?
Have you tried tweaking the cfhttp useragent and/or any other headers as per How to emulate a real http request via cfhttp?

grabbing JSON data using coldfusion

I have a URL which when run in the browser, displays JSON data, since I am new to coldfusion, I am wondering, what would be a good way to
grab the data from the web browser? Later on I will be storing the individial JSON data into MySQL database, but I need to figure out step 1
which is grabbing the data.
Please advise.
Thanks
You'll want to do a cfhttp request to load the external content.
Then you can use deserializeJSON to convert the JSON object into the appropriate cfml struct.
See the example Adobe gives in the deserializeJSON documentation.
Here is quick example:
<!--- Set the URL address. --->
<cfset urlAddress="http://ip.jsontest.com/">
<!--- Generate http request from cf --->
<cfhttp url="#urlAddress#" method="GET" resolveurl="Yes" throwOnError="Yes"/>
<!--- handle the response from the server --->
<cfoutput>
This is just a string:<br />
#CFHTTP.FileContent#<br />
</cfoutput>
<cfset cfData=DeserializeJSON(CFHTTP.FileContent)>
This is object:<br />
<cfdump var="#cfData#">
Now you can do something like this:<br />
<cfoutput>#cfData.ip#</cfoutput>
Execute this source here http://cflive.net/

How to create a new CouchDB User without Futon or Curl?

I'm searching for a way to create a new CouchDB user without using Futon or Curl... just a straight http request.
One way I found (http://stackoverflow.com/questions/3456256/error-creating-user-in-couchdb-1-0) puts a JSON doc to "http://localhost:5984/_users/org.couchdb.user:username" to create a user.
I have attempted the following:
<cfhttp url="http://127.0.0.1/_users/org.couchdb.user:xyz_company" port="5984" method="PUT" username="#variables.couch_username#" password="#variables.couch_password#">
<cfhttpparam type="header" name="Content-Type" value="application/json">
<cfhttpparam type='body' name='org.couchdb.user:xyz_company' value='{"roles":[],"name":"xyz_company","salt":"3B33BF09-26B9-D60A-8F469D01286E9590","id":"org.couchdb.user:xyz_company","password_sha":"096EA41A5A81EA1507F2C6F7EDC364C0B82694AC","type":"user"}'>
I keep receiving the following back from Couch:
cfhttp.statuscode = 405 Method Not Allowed
cfhttp.filecontent = Method Not Allowed; The requested method PUT is not allowed for the URL /_users/org.couchdb.user:xyz_company
Any thoughts or suggestions?
UPDATE:
I edited my code based on Marcello's suggestions. I still receive the same 405 Method Not Allowed error. Here is the code now:
<cfhttp url="http://127.0.0.1/_users/org.couchdb.user:xyz_company" port="5984" method="PUT" username="#variables.couch_username#" password="#variables.couch_password#"><cfhttpparam type="header" name="Content-Type" value="application/json;charset=UTF-8"><cfhttpparam type='body' value='{"roles":[],"name":"xyz_company","salt":"3B33BF09-26B9-D60A-8F469D01286E9590","_id":"org.couchdb.user:xyz_company","password_sha":"096EA41A5A81EA1507F2C6F7EDC364C0B82694AC","type":"user"}'></cfhttp>
Any more suggestions? Thank you!
curl is a straight http request. There are other ways to create such requests: you can craft them with your browser; you can use a different program (e.g. wget); or even write your own (e.g. in Python or in JavaScript with V8 or Rhino).