CFDOCUMENT creates PDF with different MD5 hashes for same input - coldfusion

I am using CFDOCUMENT to create a PDF in CF9.0.1. However with the same input each time I generate a new PDF using CFDOCUMENT the MD5 hash seems to be different.
Test code is simple:
<cfdocument name=FileData1 format="PDF" localurl="yes" pagetype="A4"><h3>I am happy!</h3></cfdocument>
<cfdocument name=FileData2 format="PDF" localurl="yes" pagetype="A4"><h3>I am happy!</h3></cfdocument>
<cffile ACTION="write" FILE="C:\happy1.pdf" OUTPUT=#FileData1# ADDNEWLINE=NO NAMECONFLICT="Override">
<cffile ACTION="write" FILE="C:\happy2.pdf" OUTPUT=#FileData2# ADDNEWLINE=NO NAMECONFLICT="Override">
Both files produced have different MD5 file-hash although both PDF looks exactly the same. I have a user requirement where if the file is the same to ignore regeneration of PDF, so does anyone know how to force CF9 to generate the same PDF with same MD5 hash (bit similarity) if given the same input?
I ran a HxD Hex File Compare and found that the file differs in three sections:
The font name e.g. 62176/FontName/OJSSWJ+TimesNewRomanPS (the OJSSWJ is random)
The timestamp /CreationDate(D:20110927152929+08'00')
Some sort of key at the end: <]/Info 12 0 R/Size 13>>
Thanks for your help in advance!

They will never be the same.
The timestamp /CreationDate(D:20110927152929+08'00')
The creationDate is a timestamp of when it was created, thus unless you create it at the same second every time, it wont be the same.
You might be able to modify the pdf and remove or modify this bit.
Or use a different method to determine if you should create the pdf, creating it to md5 compare the results seems like a waste of processing power.

Related

How can i manipulate csv's from within c++

I am trying to create a program that can read out to a csv (comma separated). Is there a way to manipulate say the column width or whether a cell is left or right justified internally from my code so that when i open up the file in excel it looks better than a bunch of strings cramped into tiny cells. My goal is for the user to do as little thinking as possible. If they open up the file and have to size everything right just to see it that seems a little crummy.
CSV is a plain text file format. It doesn't support any visual formatting. For that, you need to write the data to another file format such as .xlsx or .ods.

DBF Table Join without using Arcpy?

I have created a rather large CSV file (63000 rows and around 40 columns) and I want to join it with an ESRI Shapefile.
I have used ArcPy but the whole process takes 30! minutes. If I make the join with the original (small) CSV file, join it with the Shapefile and then make my calculations with ArcPy and continously add new fields and calculate the stuff it takes 20 minutes. I am looking for a faster solution and found there are other Python modules such as PySHP or DBFPy but I have not found any way for joining tables, hoping that could go faster.
My goal is already to get away from ArcPy as much as I can and preferable only use Python, so preferably no PostgreSQL and alikes either.
Does anybody have a solution for that? Thanks a lot!
Not exactly a programmatical solution for my problem but a practical one:
My shapefile is always static, only the attributes of the features will change. So I copy my original shapefile (only the basic files with endings .shp, .shx, .prj) to my output folder and rename it to the name I want.
Then I create my CSV-File with all calculations and convert it to DBF and save it with the name of my new shapefile to the output folder too. ArcGIS will now load the shapefile along with my own DBF file and I don't even need to do any tablejoin at all!
Now my program runs through in only 50 seconds!
I am still interested in more solutions for the table join problem, maybe I will encounter that problem again in the future where the shapefile is NOT always static. I did not really understand Nan's solution, I am still at "advanced beginner" level in Python :)
Cheers

SpreadsheetFormatRow abruptly stops working

I've seen this post, but there does look to be a resolution. Anyway, I'm using ColdFusion 10 to generate an Excel spreadsheet. However, when I use SpreadsheetFormatRow() and pass in the rows to be formatted, it only does about 3 and then abruptly stops. Here is an example...
ColdFusion Code
<cfscript>
rowCount = 1;
headingRows = 4;
// Create instance of new Spreadsheet
excelSheet = SpreadsheetNew("ReportName",false);
// HEADING (IMAGE) ROW FORMAT
formatHeadingRow = StructNew();
formatHeadingRow.fgcolor="blue";
// Add rows to fill the header area (must add as many as we are spanning with the above image)
for (x=0;x<headingRows;x++) {
SpreadsheetAddRow(excelSheet,"TEST,TEST,TEST,TEST,TEST,TEST,TEST,TEST,TEST,TEST,TEST,TEST");
SpreadsheetFormatRow(excelSheet,formatHeadingRow,rowCount);
rowCount++;
}
</cfscript>
<!--- stream it to the browser --->
<cfheader name="Content-Disposition" value="inline; filename=reportName.xls">
<cfcontent type="application/vnd.ms-excel" variable="#SpreadSheetReadBinary(excelSheet)#">
and here is a screenshot of the resulting Excel sheet
Why is the formatting stopping after X number of rows and cells?
If I switch to using XML format with
excelSheet = SpreadsheetNew("ReportName",true);
it works properly. However I'm using a custom palette for my colors so I don't think switching to XLSX format is going to work for me. When I try and then call
palette = excelSheet.getWorkbook().getCustomPalette();
I get an error stating that getCustomPalette() method is undefined.
coldfusion.runtime.java.MethodSelectionException: The getcustompalette method was not found
Can anyone help me figure this out? Thank you!!!
Or even better since it works with the XML format, can anyone show example of how to use a custom palette with the XLSX (xml format)
This is an issue I have seen often when dealing with xls files from CF; they seem to stop applying styles after a certain number of cells. I've been able to work around it by outputting to xlsx instead. (I was able to replicate and "fix" your issue by doing so.)
excelSheet = SpreadsheetNew("ReportName",true);
...
<cfheader name="Content-Disposition" value="inline; filename=reportName.xlsx">
<cfcontent type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
variable="#SpreadSheetReadBinary(excelSheet)#">
Since you are applying the exact same format to all rows, only do it once, not on each row. Using SpreadsheetFormatCellRange after the loop should resolve the issue:
SpreadsheetFormatCellRange(excelSheet
, formatHeadingRow
, startRow
, startCol
, endRow
, endCol );
I suspect the problem somehow relates back to Excel's maximum style limits. Since CF is a black box, it is difficult to know how many styles it actually creates or exactly how they are applied. However, in my experience it is very easy to exceed the style limits without even knowing it. Especially when using the older .xls file format, whose limits are much lower. That is why I suggested using the newer .xlsx format instead.
getCustomPalette() method is undefined.
Correct. It does not exist in XSSF. Is there some reason you need a custom palette instead of just defining your own colors, as mentioned in your other thread?

Convert text to elements in XML using XSLT

I'm currently migrating XML from one CMS to another and needs to convert some text to elements. Because of how the system works, some editors can only enter escaped text. The challange is to replace some of these escaped elements and convert them into valid XML elements.
Source file:
<p>Press the <button-name>Select key </button-name>to show more information.</p>
<p>Press the <button-name>Back key</button-name> to save the
values.</p>
<p>When the storage is completed, the <product-name/> machine
displays:</p>
<p><attention>
<display-text translate="no">STORAGE COMPLETED
Press BACK to exit</display-text>
</attention></p>
What I want to do
Replace <button-name> with <gui>
Replace <button-name> with <kt.in name="custom-name"/>
Keeping other escaped elements.
XML I want
<p>Press the <gui>Select key</gui>to
show more information.</p>
<p>Press the <gui>Back key</gui>
to save the calibrations values.</p>
<p>When the storage is completed, the <kt.in name="custom-name"/> machine
displays:</p>
<p><attention> <display-text translate="no">STORAGE COMPLETED
Press BACK to exit</display-text>
</attention></p>
I tried using a string-based search-and-replace but as I want proper an XML element as output this wouldn't do it.
This is probably only going to work by string-based-search-and-replace - depending on the amount of text "tags" you want to switch to xml. The bigger problem I see is actually keeping it all in a proper XML-Element.
I dont think that you could keep this without writing a small tool that will read the strings between the elements of the text e.g.
<button-name>
and copy them into the right variables of an Object which you then parse back as XML conform Element.
It doesnt really depend on the language you prefer since there should be plenty of object-xml parsers available
For just changing the tags you could also switch the encoding of the text as
< would turn into -> <
and then filter any content in between the <> to exchange the ones you want e.g. button-name to gui
hope I could give you an idea..

use uno (openoffice api) to open spreadsheet *without* recalculation

I'm using pyuno to read an excel spreadsheet (running on linux.) Many cells have formulas referring to addins that are, obviously, not available. However the cell values are what I want.
But when I load and read the sheet, it seems those formulas are being evaluated and thus the values are being overwritten with errors.
I've tried several things, none of which have worked:
set flags AutomaticCalculation=False, MacroExecutionMode=NEVER_EXECUTE in the call to desktop.loadComponentFromURL
call document.enableAutomaticCalculation(False) on the loaded document
Any suggestions?
If formluas aren't a matter, you might circumvent the problem by processing a copy of your spreadsheet in which only the values (not the formulas) are present.
To achieve this quickly, select the whole sheet content, copy, special paste; then remove everything except "value". Save to a new file (make sure you don't overwrite the original file or every formula will be lost!). Your script should then be able to process this file.
This is an ugly solution, as there must be a way to do it programmaticaly.
Calc does not yet support using the cached results after loading the document. Libreoffice Calc does now use cached results for xls documents. The results are also stored in ods but are ignored while loading the document and the formula result is evaluated by compiling and interpreting the saved formula.
There are some plans to add this for ods and xlsx too but there are many ods producers out there writting incorrect results in the file. So till now the only solution is to have a second version of the document only saving the results (or implementing it inside calc).