Railo 4 - which document formats are supported by Cfindex / Lucene? - coldfusion

I thought i had a simple question, but somehow i cant find a source for the answer....which document formats can be indexed by the Lucene version that is packaged with Railo 4.0?
Somehow .doc and .pdf seem to go well, but docx and rtf just don't seem to get indexed....Is there a list available somewhere? And for all formats that arent supported, what would be the best way to get that info indexed aswell by cfindex?
<cfindex
collection = "#collection#"
action = "update"
type = "file"
key ="#ABSfilepath#"
title="#ABSfilepath#"
>
thanks!
Question also posted to Railo mailing list: web link.

Railo 4 uses Lucene 2.4.1 - how do you tell? Same way you tell the version for all third-party software that Railo uses: locate the JAR file (in the lib/ext directory), open that archive (using 7-zip or equivalent), and look at META-INF/MANIFEST.MF where you find content like this:
Specification-Title: Lucene Search Engine: core
Specification-Version: 2.4.1
Specification-Vendor: The Apache Software Foundation
Implementation-Title: org.apache.lucene
Implementation-Version: 2.4.1 750176 - 2009-03-04 21:56:52
Implementation-Vendor: The Apache Software Foundation
This seems to be a pretty old version and doesn't look like it has any docs on the Apache Lucene website. (It might be possible to upgrade Lucene by replacing the relevant JARs, but this might also cause dependency issues; do at own risk.)
Since the Lucene website doesn't help, a search for "lucene 2.4.1 indexable documents" brings back a pertinent question about v2.3.2 which asks:
Does Lucene java supports parsing of extensions *.docx, *.pptx, *.mpp i.e.
Microsoft Windows 2007 documents?
With the response:
Lucene doesn't actually support any of the document types. What happens
is that some program is used to parse the files into an indexable stream
and that stream is indexed. That used to be POI in the old days.
Ok, so assuming that is still accurate, Lucene doesn't control the filetypes, Apache POI does.
Checking the JARs tells us Railo 4.0 uses Apache POI v3.8 and looking at the POI changelog reveals that .docx support arrived in v3.5
So, your .docx files should be supported along with the other MS Office formats. If it's definitely not being indexed, you probably need to identify if it's a POI issue or a Lucene issue or a Railo issue - creating a simple reproducable test case with both .doc and .docx documents is probably a good first step.
Beyond that, you'll need someone familiar with Lucene/POI to advise - there may or not be log files that will contain details of possible indexing/retrieval errors, or ways to interact with Lucene directly (not via Railo/cfindex) that can help identify where the issue lies.

Related

How can I compile my ColdFusion code for sourceless distribution, and have it be unreadable?

I've been tasked with creating a deployable version of a ColdFusion web app to be installed on a clients server. I'm trying to find a way to give them a compiled version of our code, and my first inclination was to use the CFCompile utility that I found here. However, after running CFCompile, most of the code in the CFM files is still readable. The only thing that appears to be obfuscated at all is the actual ColdFusion code - all of the SQL Queries are still perfectly readable. (Example in the screenshot below)
The HTML and JavaScript are also still readable in the compiled code, but that doesn't matter as those can be seen in a web browser anyways.
Is there another way to distribute my source code in a format that is completely unreadable to the user? I'm guessing that for whatever method I choose, there will be some way of decompiling the code. That's not an issue, I just need to find a way to make it more difficult than opening the file and seeing the queries.
Hostek has a pretty good write up on the subject over on their site - How to Encrypt or Compile ColdFusion Files.
Basically, from that article:
Using cfcompile.bat
The cfcompile.bat utility will compile all .cfm and .cfc files within a given directory into Java bytecode. This has the effect of making your source code unreadable, and it also prevents ColdFusion from having to compile your ColdFusion files on first use which provides a small performance enhancement.
More details about using cfcompile.bat can be found in ColdFusion's Documentation
Using cfencode.exe
The cfencode.exe utility will apply basic encryption to a specific file or directory. If used to encrypt a directory, it will apply encryption to ALL files in the directory which can break any JS, CSS, images, or other non-ColdFusion files.
They do also include this note at the bottom:
Note: Encrypting your site files with cfencode does not guarantee absolute security of your source code, but it does add a layer of obfuscation to help prevent unauthorized individuals from viewing the source.
The article goes on to give basic instructions on how to use each.
Adobe has this note on their site regarding cfencode:
Note: You can also use the cfencode utility, located in the cf_root/bin directory, to obscure ColdFusion pages that you distribute. Although this technique cannot prevent persistent hackers from determining the contents of your pages, it does prevent inspection of the pages. The cfencode utility is not available on OS X.
I would also add that it will be trivial for anyone familiar with ColdFusion to decode anything encoded with this utility because they also provide the decoder.

Epub library for C++

Is there any library in C++ for creating Epub files, I need to use it with Qt.
My program can export html & css, but I don't know how to convert that to an Epub.
from my googling efforts it appears that most of it is hand written and their isnt a globally accepted SDK. i found a nice tutorial for you which walks you through making epub files. and i did see some other links about using it with QT. maybe someone knows of a good open source project thats somewhere?
epub tutorial
Once you've got the HTML and CSS, you're most of the way there; what remains is the content.opf file, which basically lists all the files in the epub document and the overall metadata (author, publisher, ISBN, etc); and the table of contents. epub 2.0.1 uses the toc.ncx file as a table of contents--it's basically an xml document. epub 3.0 uses the toc.xhtml, which is much more intuitive--it's essentially an ordered list in a nav element. You can do either epub 2.0.1 or epub 3.0; there's enough backwards compatibility built in that older devices will be able to read an epub 3.0 file--as long as you include both a toc.ncx and a toc.xhtml.
You may have to tinker with your CSS; epub doesn't support everything, and the device manufacturers all seem to interpret things differently; it's very "browser wars"-ish.
I find the IDPF's epub spec is the best place to go for formatting info. Here's the relevant bits:
content.opf
toc.xhtml
toc.ncx

Schema for the <project> XML in a Sitecore package

Does anyone have the schema that relates to the XML document that appears in the installer folder of a Sitecore package file?
Especially interested in the format of the project/Sources/xitems/Entries/x-item element.
That XML file seems to be a representation of the Sitecore.Install.PackageProject class, so I tried to generate an XSD from code using serialization on that class.
However, if you use a decompile to take a look at how package building and installation works, you'll find out that Sitecore has written their own serializer for this.
So I wasn't able to generate a correct XSD with the .NET serializer.
With a decompiler (I use dotPeek, freeware) you can track down a lot of information about that XML file and how it's being used by Sitecore, but I donĀ“t see a (realistic) way to extract a schema from this.
If you're going to look into it, look inside Sitecore.Kernel.dll and look for the Sitecore.Install namespace.
Have you tried asking Sitecore Support? If anyone has this schema, it's them.

MS Word/ ODF Automation in Qt

How can I perform automation of MS Word documents (.doc) or ODF documents (.odt) in Qt 4.5? I know using the QAxWidget, QAxObject.
I have data (QString) and few images as well. I have to add them into the document. I googled but I couldn't find any commands for MS- Word/ ODF. But I want the specific commands that should be passed in QAxObject::dynamicCall() function to perform my operations.
For e.g in MS Excel we have to use something like,
excel.querySubObject("ActiveWorkBook");
which will return the object of the Active workbook of the Excel document.
What are all the commands that are available for the generation of MS-Word or ODF (odt) documents? I am using Windows XP. Any links, examples are welcome..
Take a look at http://doc.trolltech.com/qq/qq27-odfwriter.html, Qt provides functionality to create OpenDocument Format (ODF) files.
The ActiveX commands related to the MS Word can be obtained by the VBAWD10.chm that is being installed along with MS - Word.
The details of the ActiveX help documents available can be obtained here.
The toughest part is to conform those in such a way that it can accessed through the ActiveQt Module.
I provided a similar solution to my question here
Hope it helps for those who are all looking similar solutions..

Dealing with obsolete versions of RTF

Summary questions:
Do you know of a lightweight application that can save files in RTF Version 1.6 format?
Do you know what version of RTF Abiword's "Rich Text Format for old apps" corresponds to?
Do you know a way to inspect an RTF file and determine what version of RTF it's encoded under?
Do you know which DLL describes the RTF format on a Windows NT 4.0 machine and whether it can be upgraded?
I have a legacy MS Visual C++ 6.0 MFC application that runs on an embedded Windows NT 4.0 machine. The application provides in-app help using MFC's CRichEditView class to pull text out of an RTF file called help.rtf. The help file is saved as RTF version 1.6. It has always been edited using MS Word 2000 or the version of WordPad that comes with Windows NT 4.0.
The problem is that our developer workstations tend to have Windows XP (and its version of WordPad) and Office 2003 or better, both of which use more recent versions of RTF than 1.6, and it is becoming increasingly cumbersome to find a machine on which the file can be edited and re-saved in that obsolete format. If a newer version of Word or WordPad is used to save the file, it gets saved as a newer version of RTF. Then, when the application is run on the NT machine, the help text doesn't display properly. (Although when the same application is run on an XP machine, the help text does display properly.)
So, I'm looking to do one of two things:
Find an application (preferably lighter-weight than Word 2000) that will save files in RTF version 1.6 format, that we can use for future editing of the help file.
Figure out a way to get the NT machine to read later versions of RTF properly.
On the first front, I've tried AbiWord, which has a "Rich Text Format for old apps" option, but I can't tell what version of RTF this option outputs. Do you know what version this is? Unfortunately, it's not readily apparent from the metadata in the file, which just says "rtf1", per this cute passage from all versions of the RTF spec. Is there a way to analyze an RTF file and determine what version of RTF it's encoded under?
The RTF standard described in this RTF Specification, although titled as version 1.6, continues to correspond syntactically to RTF Specification version 1. Therefore, the numeric parameter N for the \rtf control word should still be emitted as 1.
On the second front, I'm wondering if there's some DLL that I can just update so that Windows NT will recognize the newer version of the format. Do you know which DLL describes the RTF format and whether it can be upgraded?
I believe the rich edit format is determined by the rich edit control itself. I wouldn't try to upgrade the DLL, because there's a lot that could break.
See this MSDN note for hints on using the later version of the rich edit control. Version 2.0 should be available in NT 4.0.
http://msdn.microsoft.com/en-us/library/tt1cfb9f(VS.80).aspx
You might try copying the version of WordPad from your NT system and see if that works as an alternative.
Following a chain of hints that started with Mark Ransom's answer, I ended up copying riched20.dll and riched32.dll from C:\Windows\System32\ on my XP machine to C:\WinNT\System32\ on the NT machine. After I did this, RTF files edited with WordPad or Word on the XP machine rendered correctly on both WordPad and my application on the NT machine.
First thing that comes to mind is WordPad. It's on every machine and is really lightweight in it's RTF. I've found it much better than Word at many simple RTF tasks.