ColdFusion searching robots.txt for specific page exception - coldfusion

We're adding some functionality to our CMS whereby when a user creates a page, they can select an option to allow/disallow search engine indexing of that page.
If they select yes, then something like the following would apply:
<cfif request.variables.indexable eq 0>
<cffile
action = "append"
file = "C:\websites\robots.txt"
output = "Disallow: /blocked-page.cfm"
addNewLine = "yes">
<cfelse>
<!-- check if page already disallowed in robots.txt and remove line if it does --->
</cfif>
It's the <cfelse> clause I need help with.
What would be the best way to parse robots.txt to see if this page had already been disallowed? Would it be a cffile action="read", then do a find() on the read variable?
Actually, the check on whether the page has already been disallowed would probably go further up, to avoid double-adding.

You keep the list of pages in database and each page record has a indexable bit, right? If yes, simpler and more reliable approach would be to generate new robots.txt each time some page is added/deleted/changes indexable bit.
<!--- TODO: query for indexable pages ---->
<!--- lock the code to prevent concurrent changes --->
<cflock name="robots.txt" type="exclusive" timeout="30">
<!--- flush the file, or simply start with writing something --->
<cffile
action = "write"
file = "C:\websites\robots.txt"
output = "Sitemap: http://www.mywebsite.tld/sitemap.xml"
addNewLine = "yes">
<!--- append indexable entry to the file --->
<cfloop query="getPages">
<!--- we assume that page names are not entered by user (= safe names) --->
<cffile
action = "append"
file = "C:\websites\robots.txt"
output = "Disallow: /#getPages.name#.cfm"
addNewLine = "yes">
</cfloop>
</cflock>
Sample code is not tested, be aware of typos/bugs.

Using the Robots.txt files for this purpose is a bad idea. Robots.txt is not a security measure and you're handing "evildoers" a list of pages that you don't want indexed.
You're much better off using the robots meta tag, which will not provide anyone with a list of pages that you don't want indexed, and gives you greater control of the individual actions a robot can perform.
Using the meta tags, you would simply output the tags when generating the page as usual.

<!--- dummy page to block --->
<cfset request.pageToBlock = "/blocked-page.cfm" />
<!--- read in current robots.txt --->
<cffile action="read" file="#expandPath('robots.txt')#" variable="data" />
<!--- build a struct of all blocked pages --->
<cfset pages = {} />
<cfloop list="#data#" delimiters="#chr(10)#" index="i">
<cfset pages[listLast(i,' ')] = '' />
</cfloop>
<cfif request.variables.indexable eq 0>
<!--- If the page is not yet blocked add it --->
<cfif not structKeyExists(pages,pageToBlock)>
<cffile action="append" file="C:\websites\robots.txt"
output="Disallow: #request.pageToBLock#" addNewLine="yes" />
<!--- not sure if this is in a loop but if it is add it to the struct for nex iteration --->
<cfset pages[request.pageToBlock] = '' />
</cfif>
</cfif>
This should do it. Read in the file, loop over it and build a struct of the bloocked pages. Only add a new page if it's not already blocked.

Related

Custom CFInclude for file customization

Our code base has quite a bit of the following example as we allow a lot of our base pages to be customized to our customers' individual needs.
<cfif fileExists("/custom/someFile.cfm")>
<cfinclude template="/custom/someFile.cfm" />
<cfelse>
<cfinclude template="someFile.cfm" />
</cfif>
I wanted to create a custom CF tag to boilerplate this as a simple <cf_custominclude template="someFile.cfm" />, however I ran into the fact that custom tags are effectively blackboxes, so they aren't pulling in local variables that exist prior to the start of the tag, and I can't reference any variable that was created as a result of the tag from importing the file.
E.G.
<!--- This is able to use someVar --->
<!--- Pulls in some variable named "steve" --->
<cfinclude template="someFile.cfm" />
<cfdump var="#steve#" /> <!--- This is valid, however... --->
<!--- someVar is undefined for this --->
<!--- Pulls in steve2 --->
<cf_custominclude template="someFile.cfm" />
<cfdump var="#steve2#" /> <!--- This isn't valid as steve2 is undefined. --->
Is there a means around this, or should I utilize some other language feature to accomplish my goal?
Well, I question doing this at all but I know we all get handed code at times we have to deal with and the struggle it is to get people to refactor.
This should do what you are wanting. One important thing to note is that you will need to ensure your custom tag has a closing or it won't work! Just use the simplified closing, so like you had it above:
<cf_custominclude template="someFile.cfm" />
This should do the trick, called it has you had it : custominclude.cfm
<!--- executes at start of tag --->
<cfif thisTag.executionMode eq 'Start'>
<!--- store a list of keys we don't want to copy, prior to including template --->
<cfset thisTag.currentKeys = structKeyList(variables)>
<!--- control var to see if we even should bother copying scopes --->
<cfset thisTag.includedTemplate = false>
<!--- standard include here --->
<cfif fileExists(expandPath(attributes.template))>
<cfinclude template="#attributes.template#">
<!--- set control var / flag to copy scopes at close of tag --->
<cfset thisTag.includedTemplate = true>
</cfif>
</cfif>
<!--- executes at closing of tag --->
<cfif thisTag.executionMode eq 'End'>
<!--- if control var / flag set to copy scopes --->
<cfif thisTag.includedTemplate>
<!--- only copy vars created in the included page --->
<cfloop list="#structKeyList(variables)#" index="var">
<cfif not listFindNoCase(thisTag.currentKeys, var)>
<!--- copy from include into caller scope --->
<cfset caller[var] = variables[var]>
</cfif>
</cfloop>
</cfif>
</cfif>
I tested it and it works fine, should work fine being nested as well. Good luck!
<!--- Pulls in steve2 var from include --->
<cf_custominclude template="someFile.cfm" />
<cfdump var="#steve2#" /> <!--- works! --->

Excluding items from a list in coldfusion by type

Is there a way to exclude certain items by filetype in a list in Coldfusion?
Background: I just integrated a compression tool into an existing application and ran into the problem of the person's prior code would automatically grab the file from the upload destination on the server and push it to the Network Attached Storage. The aim now is to stop their NAS migration code from moving all files to the NAS, only those which are not PDF's. What I want to do is loop through their variable that stores the names of the files uploaded, and exclude the pdf's from the list then pass the list onto the NAS code, so all non pdf's are moved and all pdf's uploaded remain on the server. Working with their code is a challenge as no one commented or documented anything and I've been trying several approaches.
<cffile action="upload" destination= "c:\uploads\" result="myfiles" nameconflict="makeunique" >
<cfset fileSys = CreateObject('component','cfc.FileManagement')>
<cfif Len(get.realec_transactionid)>
<cfset internalOnly=1 >
</cfif>
**This line below is what I want to loop through and exclude file names
with pdf extensions **
<cfset uploadedfilenames='#myfiles.clientFile#' >
<CFSET a_insert_time = #TimeFormat(Now(), "HH:mm:ss")#>
<CFSET a_insert_date = #DateFormat(Now(), "mm-dd-yyyy")#>
**This line calls their method from another cfc that has all the file
migration methods.**
<cfset new_file_name = #fileSys.MoveFromUploads(uploadedfilenames)#>
**Once it moves the file to the NAS, it inserts the file info into the
DB table here**
<cfquery name="addFile" datasource="#request.dsn#">
INSERT INTO upload_many (title_id, fileDate, filetime, fileupload)
VALUES('#get.title_id#', '#dateTimeStamp#', '#a_insert_time#', '#new_file_name#')
</cfquery>
<cfelse>
<cffile action="upload" destination= #ExpandPath("./uploaded_files/zip.txt")# nameconflict="overwrite" >
</cfif>
Update 6/18
Trying the recommended code helps with the issue of sorting out filetypes when tested outside of the application, but anytime its integrated into the application to operate on the variable uploadedfilenames the rest of the application fails and the multi-file upload module just throws a status 500 error and no errors are reported in the CF logs. I've found that simply trying to run a cfloop on another variable not related to anything in the code still causes it to error.
As per my understanding, you want to filter-out file names with a specific file type/extension (ex: pdf) from the main list uploadedfilenames. This is one of the easiest ways:
<cfset lFileNames = "C:\myfiles\proj\icon-img-12.png,C:\myfiles\proj\sample-file.ppt,C:\myfiles\proj\fin-doc1.docx,C:\myfiles\proj\fin-doc2.pdf,C:\myfiles\proj\invoice-temp.docx,C:\myfiles\proj\invoice-final.pdf" />
<cfset lResultList = "" />
<cfset fileExtToExclude = "pdf" />
<cfloop list="#lFileNames#" index="fileItem" delimiters=",">
<cfif ListLast(ListLast(fileItem,'\'),'.') NEQ fileExtToExclude>
<cfset lResultList = ListAppend(lResultList,"#fileItem#") />
</cfif>
</cfloop>
Using only List Function provided by ColdFusion this is easily done, you can test and try the code here. I would recommend you to wrap this code around a function for easy handling. Another way to do it would be to use some complex regular expression on the list (if you're looking for a more general solution, outside the context of ColdFusion).
Now, applying the solution to your problem:
<cfset uploadedfilenames='#myfiles.clientFile#' >
<cfset lResultList = "" />
<cfset fileExtToExclude = "pdf" />
<cfloop list="#uploadedfilenames#" index="fileItem" delimiters=",">
<cfif ListLast(ListLast(fileItem,'\'),'.') NEQ fileExtToExclude>
<cfset lResultList = ListAppend(lResultList,fileItem) />
</cfif>
</cfloop>
<cfset uploadedfilenames = lResultList />
<!--- rest of your code continues --->
The result list lResultList is copied to the original variable uploadedfilenames.
I hope I'm not misunderstanding the question, but why don't you just wrap all of that in an if-statement that reads the full file name? Whether the files are coming one by one or through a delimited list, it should be easy to work around.
<cfif !listContains(ListName, '.pdf')>
OR
<cfif FileName does not contain '.pdf'>
then
all the code you posted

ColdFusion Link to Previous Page Clears only that Page Session Variables

I have a page (form) set up like this:
<cfif not structKeyExists(session, "checkout")>
<cflocation url="ownerInfo.cfm" addToken="false">
</cfif>
<cfif not structKeyExists(session.checkout, "vehicle")>
<cfset session.checkout.vehicle = {ownership=""}
<cfparam name="form.ownership" default="#session.checkout.vehicle.ownership#">
<cfif structKeyExists(form, "submit")>
<cfset errors = []>
<cfif not arrayLen(errors)>
<cfset session.checkout.vehicle = {ownership=form.ownership}
<cflocation url="ownerCheck.cfm" addToken="false">
</cfif>
</cfif>
I am trying to figure out how I can reset this form by having a link on another page that when this page is linked back to it will reset all the session variables to null making the entire page needing to be filled out again.
This is what I have tried but am unsuccessful.
<cfif session.checkout.vehicle.ownership != null />
<cfset session.checkout.vehicle.ownership = null />
</cfif>
I cannot use <cfset StructClear(Session)> because I do not want all the session variables cleared from the previous pages only want this page to reset.(Not All Pages or All Session Variables). Any help with this would be greatly appreciated!
You can do using structDelete(structure,key)
<cfif session.checkout.vehicle.ownership != null />
<cfset structDelete(session.checkout.vehicle,'ownership ')>
</cfif>
If you want to clear the session scope variables if the ownership key already exists in the session scope then you can do:
<cfif structKeyExists(session.checkout.vehicle, "ownership")>
<!--- struct key exists so delete it --->
<cfset structDelete(session.checkout.vehicle, "ownership")>
</cfif>
You can actually just do:
<cfif structKeyExists(session.checkout, "vehicle")>
<!--- try and delete ownership key might not exist --->
<cfset structDelete(session.checkout.vehicle, "ownership")>
</cfif>
As long as the parent scope exists, you can try and delete the ownership key without first checking that it exists.
If you want to know if the key did exist then structDelete accepts a 3rd boolean parameter so it'll return true if it did exists and false if it didn't.
<cfset didExist = structDelete(session.checkout.vehicle, "ownership", true)>
An alternative approach to solving your problem would be to reset the form if it's not a form (POST) submission. So you'd do:
<cfif structKeyExists(form, "submit")>
<!--- form has been submitted store values in session and redirect... -->
<cfelse>
<!--- form not submitted so clear the session vars... --->
</cfif>

Searching a folder (recursively) for duplicate photos using Coldfusion?

After moving and backing up my photo collection a few times I have several duplicate photos, with different filenames in various folders scattered across my PC. So I thought I would write a quick CF (9) page to find the duplicates (and can then add code later to allow me to delete them).
I have a couple of queries:-
At the moment I am just using file size to match the image file, but I presume matching EXIF data or matching hash of image file binary would be more reliable?
The code I lashed together sort of works, but how could this be done to search outside web root?
Is there a better way?
p
<cfdirectory
name="myfiles"
directory="C:\ColdFusion9\wwwroot\images\photos"
filter="*.jpg"
recurse="true"
sort="size DESC"
type="file" >
<cfset matchingCount=0>
<cfset duplicatesFound=0>
<table border=1>
<cfloop query="myFiles" endrow="#myfiles.recordcount#-1">
<cfif myfiles.size is myfiles.size[currentrow + 1]>
<!---this file is the same size as the next row--->
<cfset matchingCount = matchingCount + 1>
<cfset duplicatesFound=1>
<cfelse>
<!--- the next file is a different size --->
<!--- if there have been matches, display them now --->
<cfif matchingCount gt 0>
<cfset sRow=#currentrow#-#matchingCount#>
<cfoutput><tr>
<cfloop index="i" from="#sRow#" to="#currentrow#">
<cfset imgURL=#replace(directory[i], "C:\ColdFusion9\wwwroot\", "http://localhost:8500/")#>
<td><img height=200 width=200 src="#imgURL#\#name[i]#"></td>
</cfloop></tr><tr>
<cfloop index="i" from="#sRow#" to="#currentrow#">
<td width=200>#name[i]#<br>#directory[i]#</td>
</cfloop>
</tr>
</cfoutput>
<cfset matchingCount = 0>
</cfif>
</cfif>
</cfloop>
</table>
<cfif duplicatesFound is 0><cfoutput>No duplicate jpgs found</cfoutput></cfif>
This is pretty fun task, so I've decided to give it a try.
First, some testing results on my laptop with 4GB RAM, 2x2.26Ghz CPU and SSD: 1,143 images, total 263.8MB.
ACF9: 8 duplicates, took ~2.3 s
Railo 3.3: 8 duplicates, took ~2.0 s (yay!)
I've used great tip from this SO answer to pick the best hashing option.
So, here is what I did:
<cfsetting enablecfoutputonly="true" />
<cfset ticks = getTickCount() />
<!--- this is great set of utils from Apache --->
<cfset digestUtils = CreateObject("java","org.apache.commons.codec.digest.DigestUtils") />
<!--- cache containers --->
<cfset checksums = {} />
<cfset duplicates = {} />
<cfdirectory
action="list"
name="images"
directory="/home/trovich/images/"
filter="*.png|*.jpg|*.jpeg|*.gif"
recurse="true" />
<cfloop query="images">
<!--- change delimiter to \ if you're on windoze --->
<cfset ipath = images.directory & "/" & images.name />
<cffile action="readbinary" file="#ipath#" variable="binimage" />
<!---
This is slow as hell with any encoding!
<cfset checksum = BinaryEncode(binimage, "Base64") />
--->
<cfset checksum = digestUtils.md5hex(binimage) />
<cfif StructKeyExists(checksums, checksum)>
<!--- init cache using original on 1st position when duplicate found --->
<cfif NOT StructKeyExists(duplicates, checksum)>
<cfset duplicates[checksum] = [] />
<cfset ArrayAppend(duplicates[checksum], checksums[checksum]) />
</cfif>
<!--- append current duplicate --->
<cfset ArrayAppend(duplicates[checksum], ipath) />
<cfelse>
<!--- save originals only into the cache --->
<cfset checksums[checksum] = ipath />
</cfif>
</cfloop>
<cfset time = NumberFormat((getTickcount()-ticks)/1000, "._") />
<!--- render duplicates without resizing (see options of cfimage for this) --->
<cfoutput>
<h1>Found #StructCount(duplicates)# duplicates, took ~#time# s</h1>
<cfloop collection="#duplicates#" item="checksum">
<p>
<!--- display all found paths of duplicate --->
<cfloop array="#duplicates[checksum]#" index="path">
#HTMLEditFormat(path)#<br/>
</cfloop>
<!--- render only last duplicate, they are the same image any way --->
<cfimage action="writeToBrowser" source="#path#" />
</p>
</cfloop>
</cfoutput>
Obviously, you can easily use duplicates array to review the results and/or run some cleanup job.
Have fun!
I would recommend split up the checking code into a function which only accepts a filename.
Then use a global struct for checking for duplicates, the key would be "size" or "size_hash" and the value could be an array which will contain all filenames that matches this key.
Run the function on all jpeg files in all different directories, after that scan the struct and report all entries that have more than one file in it's array.
If you want to show an image outside your webroot you can serve it via < cfcontent file="#filename#" type="image/jpeg">

How do I redirect based on referral in ColdFusion

I have a coldfusion web site I need to change. Have no idea or experience with this environment (I do know ASP.NET). All I need to do is to write a condition based on the referral value (the URL) of the page, and redirect to another page in some cases.
Can anyone give me an example of the syntax that would perform this?
All of the other examples would work...also if you're looking to redirect based on a referral from an external site, you may want to check CGI.HTTP_REFERER. Check out the CGI scope for several other options.
<cfif reFindNoCase('[myRegex]',cgi.http_referer)>
<cflocation url="my_new_url">
</cfif>
...my example uses a regex search (reFind() or reFindNoCase()) to check the referring URL...but you could also check it as a list with / as a delimiter (using listContainsNoCase()) depending on what you're looking for.
Lets assume your the URL variable you are basing this on is called goOn (http://yoursite.com?goOn=yes) then the following code would work:
<cfif structKeyExists(url, "goOn") AND url.goOn eq "yes">
<cflocation url="the_new_url" addtoken="false">
</cfif>
Nothing will happen after the cflocation.
There is a CGI variable scope in ColdFusion that holds information on the incoming request. Try the following:
<cfif CGI.SCRIPT_NAME EQ 'index.cfm'>
<cflocation url="where you want it to redirect" />
</cfif>
To see what else is available within the CGI scope, check out the following:
http://livedocs.adobe.com/coldfusion/8/htmldocs/Expressions_8.html#2679705
Haven't done coldfusion in a little while but:
<cfif some_condition_based_on_your_url>
<cflocation url="http://where_your_referrals_go">
</cfif>
<!--- continue processing for non-redirects --->
A dynamic version.
<cfif isdefined("somecondition")>
<cfset urlDestination = "someurl">
<cfelseif isdefined("somecondition")>
<cfset urlDestination = "someurl">
.
.
.
<cfelse>
<cfset urlDestination = "someurl">
</cfif>
<cflocation url = urlDestination>