Can't delete Solr keys - coldfusion

Having trouble deleting keys from a Solr collection for files.
Updating the Solr collection with this:
<cfoutput query="fileQuery">
<cfset theFile = defaultpath & "#fileID#.pdf" />
<cfif fileExists(theFile)>
<cfindex
action="update"
collection="file_vault_solr"
type="file"
key="#theFile#"
title="#documentName#"
body="fileNumber,documentName"
custom1="/filevault/#filealias#"
custom2="#fileNumber#"
custom3="#documentName#"
>
</cfif>
</cfoutput>
However, when attempting to delete the key from the catalog it simply doesn't work. Here's the code being used to (try to) delete the keys:
<cfoutput query="deletedFile">
<cfset theFile = defaultpath & "#fileID#.pdf" />
<!--- Remove the deleted file from the collection. --->
<cfindex
collection="file_vault_solr"
type="file"
action="Delete"
key="#theFile#"
>
</cfoutput>
The key is not deleted, however. The only thing that has worked has been to purge the whole catalog and re-index all of the documents.
Any insights?

After a lot of debugging I found out.
The reason for this behavior is a very… uh… unfortunate uhm… "design decision" Adobe took when implementing the interface between ColdFusion and Solr.
So you have a Solr collection of indexed files and want to selectively purge the ones that do no longer exist on disk. I'm pretty sure that's the exact situation you've been in.
Let's assume:
there is a file called /path/to/file on your system and
it is indexed in the Solr collection foo.
When you issue a <cfindex collection="foo" action="delete" key="/path/to/file">, ColdFusion sends the following HTTP request to Solr:
POST /solr/foo/update?wt=xml&version=2.2 (application/xml; charset=UTF-8)
<delete><id>1247603285</id></delete>
This is a perfectly reasonable request that Solr will happily fulfill. The only strange thing is the number in the <id>. In any case, the file will be gone from the index after this operation.
Re-index the file and delete it from disk. Now:
there no longer is a file called /path/to/file on your system, but
it is still indexed in the Solr collection foo.
Let's do the same <cfindex action="delete"> operation again.
POST /solr/foo/update?wt=xml&version=2.2 (application/xml; charset=UTF-8)
<delete><id>/path/to/file</id></delete>
Huh? Shouldn't there be a number in the ID?
As it turns out, someone at Adobe thought it would be a jolly smart idea to use numbers for unique IDs of indexed files, to, uhhh, save space, I assume.
However for some inexplicable reason this only happens when the file in question still exists. If it does not exist anymore, ColdFusion will notice and pass the path instead.
Inspecting the number reveals that it would fit into a 32 bit signed integer value. (I've checked, there are plenty of negative values in the uid field of the collection.)
So this looks as if they use some kind of hashing algorithm that returns 32 bits and chuck that in a int. CRC32 springs to mind, but that's not it. Also, java.util.zip.CRC32 returns a long, so there wouldn't be any negative values in the first place.
The other readily available 32 bit hash in Java is ... java.lang.Object.hashCode().
Bingo.
"/path/to/file".hashCode() // -> 1247603285
So the solution is to never delete a file by its path, but always like this:
<cfindex collection="foo" action="delete" key="#path.hashCode()#">
For files that no longer exist this does the right thing.
More importantly: For files that still exist this does the right thing as well - ColdFusion would have sent the hash code anyway.
Until Adobe fixes this problem this is a safe and easy work-around.
Note that the file path is case sensitive and must match exactly with the one stored in the index.
A quick
<cfsearch collection="foo" name="foo">
without any criteria will return all index entries, so retrieving the exact path of orphaned entries it not a big problem.
Eric Lippert explains object hash codes and why it is a bad idea to use them for anything "practical" in an application It's a .NET article but applies to Java just as well.
It boils down to: Adobe should store the actual path in the Solr collection and leave the performance optimization they seem to have attempted to Solr.
I've filed Bug 3589991 against Adobe's ColdFusion bug database.

The key has to match exactly what is in Solr's index. So ensure that "defaultpath" is the same in both and check that the case matches as I believe Solr is case sensitive.

To debug this I would suggest that you add the status="myStatusVar" to the cfindex call . Then on both the add and delete to see what is going on. If the delete is not returning a Deleted Count. Then there is a Key mismatch.
<cfindex
collection="file_vault_solr"
type="file"
action="Delete"
key="#theFile#"
status="myStatusVar"
>

Related

Referencing code-created datasources in Lucee

I have created a number of datasources in Lucee using code. This is for a legacy ColdFusion application that we are migrating to Azure, and per the powers-that-be, they want the DSNs created in code so we can store the DSN passwords in a keystore. I have that part already working.
The datasources look something like this: this.datasources["myDSN"]
If, in the code (Application.cfm), I do this:
<cfset myDSN = this.datasources["myDSN"]>
This will then fail:
<cfquery name="whatever" datasource="#myDSN#">
It fails with "datasource myDSN not found."
BUT, if I do this instead:
<cfquery name="whatever" datasource="#this.datasources['myDSN']#">
... it works fine.
Is there a workaround for this? At last check in this one application alone, there are 368 occurrences of datasource= in 115 files. I'd rather not have to do a bulk search/replace. It makes no sense to me that the variable "myDSN" would fail.
As there are multiple datasources being used, I can't just set the default datasource and remove the datasource= attribute entirely; even then, it'd still require a mass search/replace.
I must be missing something. I've read the Lucee docs on datasources but it hasn't helped. Thanks!
Turns out that Scott Stroz was correct. I switched over to Application.cfc and now it works fine.

Excessive recrawl of ColdFusion dynamic pages

The folks who use ColdFusion and serversideincludes are having issues with excessive recrawls on dynamic pages because there is no datelastmodfied set, which causes excessive server traffic. You can laugh if you want, but when I tell them the solution is setting a last modified date on the pages I get a universal huh? how do you do that? I opened a case with google originally and was told that yep, it's a page date problem. I have done a lot of research to try and find how to code this in the header and most of what I found talked about pulling a date from a page.
I did determine that it probably could be done using the CFHEADER tag. I'm just not sure about implementing.
Can I tell them that adding something like
<cfheader NAME="datelastmodified="Mon, 01 Feb 2013 08:00:00 GMT">
will suffice? Not sure about the date format, if the day name is required.
Have I tried just asking one of the webmasters to try this? No I haven't. I would like to know that I am at least on the right track before taking up too much of their time. And so far none of them have come up with a solution on their own other than useing robots.txt to block the crawl or things along those lines.
Any suggestions or thoughts would be appreciated.
Fortunately, none of these things need to be mysterious, as they're all well documented.
last-modified HTTP header
HTTP date/time formats
<cfheader>
and even a function to format the date correctly: getHttpTimeString()
This all comes together to suggest this sort of thing:
<cfheader name="Last-Modified" value="#getHttpTimeString(now())#"> <!--- although use some timestamp indicating when the content of the page was last updated,which would be a system-specific sort of thing --->
NB: I didn't know any of the specifics to this until I googled it about 5min ago.
Google's crawlers do tend to respect the meta tag details and HTTP response values for pages they encounter and the way to set such in CF is indeed with the CFHEADER tag. You'll want to craft it to look something like this:
<CFHEADER NAME="Last-Modified" VALUE="#DateFormat(now (), 'ddd, dd mmm yyyy')# #TimeFormat(now(), 'HH:mm:ss')# GMT#gmt#">
<CFHEADER NAME="Expires" VALUE="Mon, 10 Mar 2013 05:00:00 GMT">
You will likely want a CF dev to do that work as I'm showing you two examples for the datetime value there. The first one dynamically sets it to right now (using the DateFormat() and Now() functions) and the second example sets the Expires header value with a hard coded date.
You'll probably want to include both the last-modified and expires tags and decide whether you want the dates applied to each to be either dynamic or hard coded.

List of tags not available ColdFusion 9 script syntax?

I'm looking for a complete list of tags that are not available in ColdFusion 9 script syntax.
Example:
CFSetting: is one example that is available in Railo but not in CF9 for use in cfscript
CFDocument: I can't find this one so far.
Not an official list by any measure, but this is a list I presented to a private forum a while back, and it didn't receive too much correction (and those corrections have been integrated). It was in the context of what CF does and doesn't need to be implemented, to claim 100% coverage in CFScript.
Summary of omissions:
These ones are significant omissions:
<cfcollection>
<cfexchangecalendar>
<cfexchangeconnection>
<cfexchangecontact>
<cfexchangefilter>
<cfexchangemail>
<cfexchangetask>
<cfexecute>
<cfindex>
<cfinvoke> (support for dynamic method names)
<cflogin>
<cfloginuser>
<cflogout>
<cfmodule>
<cfoutput> (implementation of query looping with grouping)
<cfparam> (fix the bug in that enforced requiredness doesn’t work (ie: param name="foo";))
<cfsearch>
<cfsetting>
<cfwddx>
<cfzip>
<cfzipparam>
There’s a reasonable case for these ones to be implemented:
<cfassociate>
<cfcache>
<cfcontent>
<cfflush>
<cfhtmlhead>
<cfheader>
<cfntauthenticate>
<cfprint>
<cfschedule>
<cfsharepoint>
These ones... I’m ambivalent:
<cfgridupdate>
<cfinsert>
<cfobjectcache>
<cfregistry>
<cfreport>
<cfreportparam>
<cftimer>
<cfupdate>
We don’t need these ones at all, I think:
<cfajaximport>
<cfajaxproxy>
<cfapplet>
<cfcalendar>
<cfchart>
<cfchartdata>
<cfchartseries>
<cfcol>
<cfdiv>
<cfdocument>
<cfdocumentitem>
<cfdocumentsection>
<cffileupload>
<cfform>
<cfformgroup>
<cfformitem>
<cfgraph>
<cfgraphdata>
<cfgrid>
<cfgridcolumn>
<cfgridrow>
<cfinput>
<cflayout>
<cflayoutarea>
<cfmap>
<cfmapitem>
<cfmediaplayer>
<cfmenu>
<cfmenuitem>
<cfpod>
<cfpresentation>
<cfpresentationslide>
<cfpresenter>
<cfselect>
<cfsilent>
<cfslider>
<cfsprydataset>
<cftable>
<cftextarea>
<cftextinput>
<cftooltip>
<cftree>
<cftreeitem>
<cfwindow>
If there's anything here that you think ought to be included in CFScript, please raise an issue here - http://cfbugs.adobe.com/cfbugreport/flexbugui/cfbugtracker/main.html - and cross reference the issue number here.
HTH.
I would argue that there are no commands that are not available as script as you can extend and write the missing bits using cfc's.
Thus wrap your favourite missing <cftag in a cfc and call it using new
However, here is a list of what is supported
http://help.adobe.com/en_US/ColdFusion/9.0/Developing/WSe9cbe5cf462523a02805926a1237efcbfd5-7ffe.html

How to get file attributes in ColdFusion 7?

I cannot find a function that tells me the attributes of a given file. I specifically need to get the file's size. How do I find this info.?
edit:
I think I found an answer, just not the answer I was hoping for:
So far till ColdFusion 7, there was no
good way to find information like
size, last modified date etc about a
file. Only way you could do that was
to use cfdirectory tag to list the
directory, get the query from it, loop
over the query until you hit the
desired file and then fetch the
required metadata.
http://coldfused.blogspot.com/2007/07/new-file-io-in-coldfusion-8-part-ii.html
Anyone know of a better way?
I believe cfdirectory is your simplest answer - but note, you can use the filter attribute as your filename, and you won't have to loop over the result.
<cffunction name="getFileSize">
<cfargument name="filepath">
<cfreturn createObject("java","java.io.File").init(Arguments.filepath).length()>
</cffunction>
The CFLib FileSysLibrary has a bunch of file functions.
FileSize
FileSizeCOM
May be of particular interest

Data type support in ColdFusion querynew()

Does anyone know of a way to store values as NVARCHAR in a manually created query in ColdFusion using the querynew() function? I have multiple parts of a largish program relying on using a query as an input point to construct an excel worksheet (using Ben's POI) so it's somewhat important I can continue to use it as a query to avoid a relatively large rewrite.
The problem came up when a user tried storing something that is outside of the VARCHAR range, some Japanese characters and such.
Edit: If this is not possible, and you are 100% sure, I'd like to know that too :)
When creating a ColdFusion query with queryNew(), you can pass a list of datatypes as a second argument. For example:
<cfset x = queryNew("foo,bar","integer,varchar") />
Alternatively, you can use cf_sql_varchar (which you would use in queryparam tags). According to the livedocs, nvarchar is accepted for the CF varchar data type.
QueryParam livedoc (referenced for nvarchar data type)
QueryNew livedoc (referenced for data type definition)
Managing Data Types livedoc (referenced for using cf_sql_datatype)
The only thing I've been able to come up with so far is this:
<cfset x = QueryNew("foobar")/>
<cfset queryAddRow(x) />
<cfset querySetCell(x, "foobar", chr(163)) />
<cfdump var="#x#">
When dumped, this query does contain the British Pound symbol.
I haven't tried this with Ben's POI utility, but hopefully it helps you some.
You might try using JavaCast() to set the values, as shown here:
Kinky Solutions (Ben Nadel) on JavaCast()
Make sure you're using Unicode end-to-end.
This is pretty much all you need:
<cfprocessingdirective pageEncoding="utf-8">
ColdFusion (& java) stores string in UTF-8 by default. All you need is to tell CF that the encoding of the page is UTF8. The alternative way is to save the Byte-order mark (BOM), but Eclipse/CFEclipse doesn't do it.