how to get xsl from existing pdf? - xslt

Is it possible to get the .xsl file from an existing .pdf file?
I know that with Apache FOP you can get a .pdf file from a .xml and .xsl but I would like to go in the other direction. Any idea?
XML+XSL->PDF with Apache FOP, but is it somehow possible PDF->XSL?????
The reason why I would like to do that is because I want to open a PDF that has a form inside, edit it adding some information to the form and then save it again as PDF.
I already have the edited form as .xml and I'm trying to generate the PDF, but the I need a .xsl file for the layout... so I thought that maybe I could reuse the layout from the original PDF as they will be the same. Any other better approach?? I would like to avoid creating a specific XSL file for every form.
Thanks

Definitely not the XSLT file, since that's not even part of what FOP does. FOP only works with FO documents, the fact that it allows you to use XML+XSLT to get the FO source is just a nice usability feature. However, once it gets the FO file, it doesn't know how that was obtained, so it can't embed in any way the XSLT file.
You could post-process the PDF file using another tool, like PDFBox, to embed any metadata you want.

Related

Regular Expression if a file is a text file

I have an uploader in JSP, and through this I can upload several kinds of file. I need to perform a control (I think with regular expression, just a simple check on file extension) where the JSP engine could read the content of the file and understand if the file is an image or plain text file. I can accept only plain text (text or XML) and discard all the other kind of files. Could someone help me or suggest another way to do that?
As a very basic check you could verify the file extension with something like \.txt$, however this doesn't prevent people from uploading different filetypes with a .txt extension. You might be better off by checking the mimetype of the file uploaded, a JSP example can be found here.

Converting HTML file to PDF using Win32/MFC

As part of my application, my client has requested that I include an automated e-mailing system. As part of this system, I generate HTML code and use automation to send it via. Outlook.
However, they also require a PDF copy of the HTML document to be sent as an attachment. My initial attempts involved using libHaru, which proved difficult to use efficiently, as I was required to create the PDF document from scratch, which required computation of the position of each of the lines in a table, and positioning of all the text, etc.
I was wondering if there would be a way to programmatically convert HTML code (or an HTML file if need be) into a PDF document either by using Win32/MFC itself or an external library.
Thanks in advance!
EDIT: Just to clarify, I am looking for solutions which minimize external dependencies.
You should evaluate this utility wkhtmltopdf:
http://code.google.com/p/wkhtmltopdf/
You can call it from the command line without the need to run a setup.
I use it generating my output documents as html then cal a ShellExecute(...) to convert it to PDF. It's great!
Inside uses webkit + qt. So compability with modern HTML is OK.
Hope it helps.
I'd take a look at PDF Creator, which can be used as a COM object (that acts pretty much like a printer). I haven't used it to print HTML, so I'm not sure, but my guess is that you'll probably end up having to instantiate a web browser control to render the HTML, and then feed it from there to the PDF control.
Some possible answers are in this thread:
C++ Library to Convert HTML to PDF?
Not sure if they will satisfy your particular requirements, but these might at least get you started.
Edit:
Some other possible options here.
Not MFC but you can try QtWebKit. It can render and export HTML to PDF, PNG, JPEG

ColdFusion - converting HTML webpage to Word or PDF document

I have a webpage, where user has a possible to Print this page OR to save it on his/her computer.
How may I save it as a Word or PDF document?
Thanks.
For the MS Word requirement, most versions of Office can interpret basic html/xml. So you might consider the old cfcontent hack as a simpler alternative to POI. (The Word package is not quite as mature as the spreadsheet package.)
Basically you generate html, but use cfheader/cfcontent to tell the browser the content is really a Word document. It is obviously not a true MS Word file. But it is simpler than most options.
http://msdn.microsoft.com/en-us/library/aa155477.aspx
<cfheader name="Content-Disposition" value="attachment; filename=someFile.doc">
<cfcontent type="application/msword">
... your html code here ...
For microsoft office documents you can use the Apache POI project. This means in your coldfusion code you need to use some basic java code to call the poi methods.
However, if you choose the pdf document things are quite easier. You can use the cfdocument tag with the PDF format option
Using the POI or OpenOffice interface (depending on your version) you can create a Word doc. Using the built-in PDF generation tools, you can create a PDF doc. HOwever, you can only present that as an option.
There is no way to override the save/print menu functions. No matter how you handle it, I cna save the source document instead of the .doc or .pdf. Similarly, you cannot prevent me from printing the original document, instead of a prepared PDF.
Here is a method that has worked for me:
Create PDF or FlashPaper with ColdFusion
However, just like printing, you will have to sacrifice some graphics, so this would be best used for exporting content (but as you did not specify, I'm just clarifying that this is possible but at a cost).
Hope that helps.
Use cfdocument to display as a PDF, then they can just click the disk image to save it to their computer. Or you can use the filename= attribute of cfdocument to assign a filename to it, and it will prompt them to save it instead of displaying in the browser.

How to Generate XSL code automatically?

I have UI which provide the facility to create own format by using drag and drop utility. I have also xml file which contains the data. Now task is how to automatically generate the .xsl file of the dynamically designed format for the data stored in xml form.
If you have any idea about the solution of the above problem.
Once (in 2006) I was using Altova MapForce to create xslt transformation.

Combining two PDF files in C++

In C++ I'm generating a PDF report with libHaru. I'm looking for someway to append two pages from an existing PDF file to the end of my report. Is there any free way to do that?
Thanks.
Try PoDoFo
http://podofo.sourceforge.net/
You should be able to open both of the PDFs as PDFMemDocuments using PDFMemDocument.Load( filename ).
Then, acquire references to the two pages you want to copy and add to the end of the document using InsertPages, or optionally, remove all but the last two pages of the source document, then call PDFDocument.
Append and pass the called document. Hard to say which would be faster or more stable.
Hope that helps,
Troy
You can use the Ghostscript utility pdf2ps to convert the PDF files to PostScript, append the PostScript files, and then convert them back to a PDF using ps2pdf.