I found a web-site that clears exif data from an image. The source can either be an uploaded picture or a URL. I thought, perhaps, I could use this with CFHTTP to do this automatically for pictures I post to my web-site. I know I can probably run my images manually through this site before I upload them to my site. Call this an exercise if you want.
Here is the code I am using, which basically matches the form source on this very simple web-site (link)
<cfhttp method="POST" url="https://www.verexif.com/en/quitar.php" result="result" >
<cfhttpparam name="foto_url" type="formfield" value="{myimageurl}">
</cfhttp>
When I CFDUMP the result, I get the following:
When I try to use DeserializeJSON(result.Filecontent), it gives me a ColdFusion error:
When I url-encode my original URL in the CFHTTP tag, the result.filecontent contains the source code of the original web-site.
As can be seen in the first image above, there is a file called 'foto_no_exif.jpg' included in the output. This is the file I need to download. How can I do this ?
In your current dump, you have the modified image, but you need to get to accesses it as binary data. You can force the file content of the request to be treated as binary data by adding the attribute getasbinary to your cfhttp tag.
Working example:
<cfset imageURL ='https://raw.githubusercontent.com/ianare/exif-samples/master/jpg/long_description.jpg'/>
<cfhttp method="get" getasbinary="yes" charset="utf-8" url="https://www.verexif.com/en/quitar.php" result="result">
<cfhttpparam name="foto_url" type="formfield" value="#imageURL#">
</cfhttp>
<cfcontent variable="#result.Filecontent#" type="image/jpg" reset="true" />
Run it on TryCF.com
Related
If I run the following code to get an image from Medium's site:
<cfhttp url="https://cdn-images-1.medium.com/max/600/1*3j1McX-y1rvKewzI2gWc_w.png"
method="get" useragent="#CGI.http_user_agent#" getasbinary="yes">
I then want to save the image with the same name that they used i.e. 1*3j1McX-y1rvKewzI2gWc_w.png.
How can I get the name of the file from the cfhttp request? I looked in the cfhttp.header for any sign of the content-disposition attribute but can't find it.
Assuming you are getting these URLs dynamically, why not just parse it for the filename first, then apply that to the filename attribute?
<cfset filename1 = ListLast("https://cdn-images-1.medium.com/max/600/1*3j1McX-y1rvKewzI2gWc_w.png","/") />
<cfhttp url="https://cdn-images-1.medium.com/max/600/1*3j1McX-y1rvKewzI2gWc_w.png"
method="get" useragent="#CGI.http_user_agent#" getasbinary="yes" path="whateverpath" filename="#filename1#>
I am using cfhttp (Lucee Server) to scrape page contents from a url in the following manner:
<cfhttp url="#libs.originalAdPage#" method="GET" />
I then place this content in a div on my page.
This code has been working for a long time.
I have a need to report on the url's that have been scraped for their content and that information is placed into another website form that is not in my control. I decided to convert the url's to shortened bitly url's. I built the process into the page to create a bitly link and return that url to replace the existing url.
If i use the page with a shortened url from linkedin the page is scraped and displayed correctly in the div.
<cfhttp url="http://bit.ly/1NPhPgc" method="GET" />
But if I do an identical cfhttp call to a Indeed.com page shorted to a bitly URL I get a connection failure error.
<cfhttp url="http://bit.ly/1RQvlim" method="GET" />[![cfdump of connection failure][1]][1]
If I open this URL directly in the browser the page is displayed correctly.
Any ideas would be greatly appreciated.
Thanks,
Michael
I don't have access to a Lucee server to test with, however cfhttp on a ColdFusion server works fine for me for both of those bitly URLs. cfhttp follows the redirect and the FileContent contains the indeed.com page as would be expected.
Have you verified what happens with the Bitly Indeed URL if you prevent cfhttp from automatically following the redirects so that you can debug and follow the redirects manually? ie
<cftry>
<cfhttp url="http://bit.ly/1RQvlim" method="GET" redirect="no" />
<cfdump var="#cfhttp.responseHeader#" />
<cfhttp url="#cfhttp.responseHeader.Location#" method="GET" />
<cfdump var="#cfhttp#" label="cfhttp2" />
<cfcatch>
<cfdump var="#cfcatch#" label="cfcatch" />
</cfcatch>
</cftry>
Indeed.com do pay attention to crawlers and user agents - just see their robots.txt for evidence of this.
Do you have access to a different server to test with in case there is something specific to Lucee's cfhttp implementation or to your IP address (eg blacklisted due to all the scraping)?
Have you tried tweaking the cfhttp useragent and/or any other headers as per How to emulate a real http request via cfhttp?
A contractor has provided us with survey data for a set of stores. The data contains the store numbers, thumbnail images and large images. The data is accessed through the contractor's secured website. In order to build a report for the data, I am trying to scrape the store numbers and images from the site instead of manually downloading each image.
I have not used CFhttp for secured sites, but have had a little success so far with:
<cfhttp
method="post"
url="http://www.website.com/impart/client_login.php"
throwonerror="Yes"
redirect = "yes"
resolveUrl = "yes">
<cfhttpparam name="user" value="myUsername" type="formfield">
<cfhttpparam name="pass" value="myPassword" type="formfield">
<cfhttpparam name="submit" value="Login" type="formfield">
How do I proceed from getting passed the authentication to the page that contains the image to download?
I think that CFHTTP may not be the best choice for this. I am good at BASH, so I would tend towards scripting it with curl, but maybe some product on this page would be easier http://www.timedicer.co.uk/web-scraping ?
What does the dump of cfhttp scope look like? Specifically, what is the status code?
If you get a status code of 200, you'll need to maintain the session as you grab each image. See the following:
http://www.bennadel.com/blog/725-Maintaining-Sessions-Across-Multiple-ColdFusion-CFHttp-Requests.htm
http://www.bennadel.com/projects/cfhttp-session.htm
See this question for saving images via CFHTTP:
Convert an image from CFHTTP filecontent to binary data with Coldfusion
We are trying to interact with a RESTful web service that expects a file.
I set the name of the field to data (as required by the API) and then specify the file as an absolute path. When the file makes it to the server, the filename in the HTTP transaction is the complete absolute path.
This causes a problem with the API as the full path is then recorded as the "FileName".
How do I get ColdFusion to report only the file name rather than the full path?
We are using ColdFusion 9.
Here is the CFML:
<cfhttp url="http://server/testcode"
port="9876"
method="post"
result="Content">
<cfhttpparam type="file"
name="data"
file="c:\temp\testfile.txt">
</cfhttp>
Here are some examples of the HTTP interactions with different browsers:
CFHTTP 9
-------------------------------7d0d117230764
Content-Disposition: form-data; name="data"; filename="c:\temp\testfile.txt"
Content-Type: text/plain
This is the text, really long, well, not really.
-------------------------------7d0d117230764--
IE8
-----------------------------7db370d80e0a
Content-Disposition: form-data; name="FileField"; filename="C:\temp\testfile.txt"
Content-Type: text/plain
This is the text, really long, well, not really.
-----------------------------7db370d80e0a--
Chrome 13
------WebKitFormBoundaryDnpFVJwCsZkzTGDc
Content-Disposition: form-data; name="FileField"; filename="testfile.txt"
Content-Type: text/plain
This is the text, really long, well, not really.
Firefox 6
-----------------------------22798303036224
Content-Disposition: form-data; name="FileField"; filename="testfile.txt"
Content-Type: text/plain
This is the text, really long, well, not really.
-----------------------------22798303036224--
Apparently IE8 and CFHTTP both do the same thing (add "c:\temp" to the file name). I'm not sure what the spec for HTTP is, but it would be nice if there was a way to get CFHTTP to leave the path off.
Is there any way to do this?
I ran into a problem similar to yours, once. I didn't care about excluding the path, but I wanted to send a different filename than the name of the file on my server's filesystem. I could not find a way to do it using CF tags at all, but I was able to get it to work by dropping into Java. I used org.apache.commons.httpclient, which ships with CF9 IIRC. It goes something like this (pardon any typos, I'm transposing from more complicated code):
oach = 'org.apache.commons.httpclient';
oachmm = '#oach#.methods.multipart';
method = createObject('java', '#oach#.methods.PostMethod').init(post_uri);
filePart = createObject('java', '#oachmm#.FilePart').init(
'fieldname',
'filename',
createObject('java', 'java.io.File').init('filepath')
);
method.setRequestEntity(
createObject('java', '#oachmm#.MultipartRequestEntity').init(
[ filePart ],
method.getParams()
)
);
status = createObject('java', '#oach#.HttpClient').init().executeMethod(method);
method.releaseConnection();
I see that the content type is text/plain so first I think that you need to add the multipart property on the CFHTTP
<cfhttp url="http://server/testcode"
port="9876"
method="post"
result="Content"
multipart = "yes">
<cfhttpparam type="file"
name="data"
file="c:\temp\testfile.txt">
</cfhttp>
Could solve your issue.
The only difference I see between all of the posts is that CF is sending name="data" while everything else is sending name="FileField". If the other browser submissions are correct, then I would change your cfhttpparam:
<cfhttpparam type="file"
name="FileField"
file="c:\temp\testfile.txt">
or even try sending an additional FileName parameter:
<cfhttpparam type="file"
name="data"
file="c:\temp\testfile.txt" />
<cfhttpparam type="formField"
name="FileName"
value="testfile.txt" />
So I was able to get access to the API and made it work. Here is the code for this specific part (as I assume that you were able to login and get a document guid).
<!--- upload a document --->
<cfhttp method="post" url="<path to watchdox api upload>/#local.guid#/upload">
<cfhttpparam type="header" name="Content-type" value="multipart/form-data">
<cfhttpparam type="header" name="x-wdox-version" value="1.0">
<cfhttpparam type="header" name="x-wdox-ssid" value="#local.xwdoxssid#" >
<cfhttpparam type="formfield" name="filename" value="testfile.txt" >
<cfhttpparam type="file" file="c:\temp\testfile.txt" name="data" >
</cfhttp>
Hope it will help.
I'm searching for a way to create a new CouchDB user without using Futon or Curl... just a straight http request.
One way I found (http://stackoverflow.com/questions/3456256/error-creating-user-in-couchdb-1-0) puts a JSON doc to "http://localhost:5984/_users/org.couchdb.user:username" to create a user.
I have attempted the following:
<cfhttp url="http://127.0.0.1/_users/org.couchdb.user:xyz_company" port="5984" method="PUT" username="#variables.couch_username#" password="#variables.couch_password#">
<cfhttpparam type="header" name="Content-Type" value="application/json">
<cfhttpparam type='body' name='org.couchdb.user:xyz_company' value='{"roles":[],"name":"xyz_company","salt":"3B33BF09-26B9-D60A-8F469D01286E9590","id":"org.couchdb.user:xyz_company","password_sha":"096EA41A5A81EA1507F2C6F7EDC364C0B82694AC","type":"user"}'>
I keep receiving the following back from Couch:
cfhttp.statuscode = 405 Method Not Allowed
cfhttp.filecontent = Method Not Allowed; The requested method PUT is not allowed for the URL /_users/org.couchdb.user:xyz_company
Any thoughts or suggestions?
UPDATE:
I edited my code based on Marcello's suggestions. I still receive the same 405 Method Not Allowed error. Here is the code now:
<cfhttp url="http://127.0.0.1/_users/org.couchdb.user:xyz_company" port="5984" method="PUT" username="#variables.couch_username#" password="#variables.couch_password#"><cfhttpparam type="header" name="Content-Type" value="application/json;charset=UTF-8"><cfhttpparam type='body' value='{"roles":[],"name":"xyz_company","salt":"3B33BF09-26B9-D60A-8F469D01286E9590","_id":"org.couchdb.user:xyz_company","password_sha":"096EA41A5A81EA1507F2C6F7EDC364C0B82694AC","type":"user"}'></cfhttp>
Any more suggestions? Thank you!
curl is a straight http request. There are other ways to create such requests: you can craft them with your browser; you can use a different program (e.g. wget); or even write your own (e.g. in Python or in JavaScript with V8 or Rhino).