Combining two PDF files in C++ - c++

In C++ I'm generating a PDF report with libHaru. I'm looking for someway to append two pages from an existing PDF file to the end of my report. Is there any free way to do that?
Thanks.

Try PoDoFo
http://podofo.sourceforge.net/
You should be able to open both of the PDFs as PDFMemDocuments using PDFMemDocument.Load( filename ).
Then, acquire references to the two pages you want to copy and add to the end of the document using InsertPages, or optionally, remove all but the last two pages of the source document, then call PDFDocument.
Append and pass the called document. Hard to say which would be faster or more stable.
Hope that helps,
Troy

You can use the Ghostscript utility pdf2ps to convert the PDF files to PostScript, append the PostScript files, and then convert them back to a PDF using ps2pdf.

Related

XLS/XLSX to CSV in c++

I've been given a project in which I need to import data from CSV, XLS and XLSX files, do some processing, then write the results to a database.
I'm working on a project that's been going on for a while and there are several import functions already that use a very nice object to handle opening files with all sorts of separators and such. And this object is key to the processing that I need to perform.
Since a CSV is basically a textfile with a different extension this object opens it perfectly and I've managed to complete most of the processing and testing with the object and values stored within.
But now I need to add the XLS and XLSX support. And since this object is now pretty much central to the processing I figured the easiest way to fit XLS and XLSX files in would be to convert them to CSV, then import that.
Any help would be appreciated and I'll try answer questions if it's necessary, but since the request is just for some way to convert from one file type to another and nothing more insightful I don't think it's really necessary to add any snippets just yet.
Your options in terms of C++ libraries:
OpenXLSX - https://github.com/troldal/OpenXLSX
XLNT - https://github.com/tfussell/xlnt
Or you could give "XLSX I/O" a try. It's a small C library.
"XLSX I/O" - https://github.com/brechtsanders/xlsxio
Don't forget to add the usual extern "C", when calling C functions from C++.
The repo contains basic xlsx-to-csv (and csv-to-xlsx) examples, which should get you started: https://github.com/brechtsanders/xlsxio/blob/master/src/xlsxio_xlsx2csv.c
Maybe this will help:
http://www.codeproject.com/Articles/42504/ExcelFormat-Library
Also you can use libraries from Open/Libre Office project.

Convert xlf to html using okapi

I have implemented a local service that allows converting multiple formats like html, docx, xlsx, tmx... to XLIFF. After performing a specific process with xlf generated file I want to get it back to its original format. I use okapi libraries for this purpose and all works properly.
I would like to know if okapi implements a mechanism to convert xlf to its original file format, speciall xlf to html (this format is mandatory for me).
Is there any suitable approach?
Thanks in advance
Yes, this is generally possible. Okapi calls it merging, and it requires that the source HTML (or other format) file is available in addition to the translated XLIFF.
A common method for doing this is to use a pair of rainbow pipelines. The first ("extraction") pipeline looks like this:
Raw Document to Filter Events
[Other steps, such as segmentation, are
optional here]
Rainbow Translation Kit Creation (select "Generic
XLIFF" as the type)
This will generate a "translation kit" containing the source file, an extracted XLIFF, and some metadata in a file called manifest.rkm. You can then modify the XLIFF to perform the translation, etc. Then, use another pipeline to perform the merge:
Raw Document to Filter Events
Rainbow Translation Kit Merging
Sort of confusingly, the source file for this merge pipeline should be the manifest.rkm file for the translation kit, not the XLIFF or the source file. Okapi will parse the manifest and figure out where everything else is, then merge the translations from the XLIFF back into a new output copy of the HTML.
This process can fail if you do sufficiently gruesome things to the XLIFF that Okapi can't figure out how to map the translated segments back to the original document any more.
A quick-and-dirty way to do this same thing, without the kit, is to use the tikal command-line tool that is bundled with Okapi. First, use this to extract test.html to test.html.xlf:
tikal.sh -fc okf_html -x test.html
Then, merge the translated test.html.xlf to an output test.out.html:
tikal.sh -fc okf_html -m test.html.xlf
I do not understand your question: can you convert files back or not? I assume not, and that's what this answer is about.
The Okapi doc at http://www.opentag.com/okapi/wiki/index.php?title=Rainbow says:
There are filters for many formats, for example: OpenOffice, XML, HTML, Properties, DTD, MS Office, tables, etc.
To convert XLIFF files back to their original format you have to add the Filter Events to Raw Document Step to your command pipeline. There are two filter configurations available for HTML, and one for HTML 5.

Converting HTML file to PDF using Win32/MFC

As part of my application, my client has requested that I include an automated e-mailing system. As part of this system, I generate HTML code and use automation to send it via. Outlook.
However, they also require a PDF copy of the HTML document to be sent as an attachment. My initial attempts involved using libHaru, which proved difficult to use efficiently, as I was required to create the PDF document from scratch, which required computation of the position of each of the lines in a table, and positioning of all the text, etc.
I was wondering if there would be a way to programmatically convert HTML code (or an HTML file if need be) into a PDF document either by using Win32/MFC itself or an external library.
Thanks in advance!
EDIT: Just to clarify, I am looking for solutions which minimize external dependencies.
You should evaluate this utility wkhtmltopdf:
http://code.google.com/p/wkhtmltopdf/
You can call it from the command line without the need to run a setup.
I use it generating my output documents as html then cal a ShellExecute(...) to convert it to PDF. It's great!
Inside uses webkit + qt. So compability with modern HTML is OK.
Hope it helps.
I'd take a look at PDF Creator, which can be used as a COM object (that acts pretty much like a printer). I haven't used it to print HTML, so I'm not sure, but my guess is that you'll probably end up having to instantiate a web browser control to render the HTML, and then feed it from there to the PDF control.
Some possible answers are in this thread:
C++ Library to Convert HTML to PDF?
Not sure if they will satisfy your particular requirements, but these might at least get you started.
Edit:
Some other possible options here.
Not MFC but you can try QtWebKit. It can render and export HTML to PDF, PNG, JPEG

Prgrammably making excel spreadsheets (97 - 2003 format)

I was wondering how difficult it would be to make an application like this. Basically, I have some old html files that use tables. I want to put these tables into excel for easier reading and manipulation. I only have text, I have no numbers of formulas or anything.
Are there any tutorials on how to do this sort of thing?
The application would produce .xls
Thanks
You have three options:
Output a CSV file. While not an XLS file, Excel is more than capable of opening such a file, and it's extremely easy to create. You need nothing more than standard C++ to implement this solution. This is by far the easiest and quickest way to output to Excel (or any spreadsheet program, for that matter).
Use OLE automation. Microsoft even has a Knowledge Base article that provides an example of how to invoke Excel from your native C++ application and fill in some values. If you absolutely need to output XLS files, this is the easiest way to go. Note that users must have Excel installed on their computers for this to work.
Create your own XLS writer. Don't even bother with this option unless you really want to generate XLS files without requiring Excel to be installed on end-user computers. Options 1 and 2 are more than good enough for just about any application.
You don't need to reverse-engineer the XLS format; Microsoft documents the excel file format here. Due to the evolution of Excel over the years, it's not exactly a clean specification.
If you don't mind installing a copy of Excel along with your program, using OLE Automation would be much easier.
The simplest thing to do is simply create a CSV file. If you have column headers, put them in the first row. CSV files can be opened natively in Excel as if they were Excel spreadsheets.
There is a trick here: save .html tables with the .xls extension and Excel can read them (ie Excel can read the output of the DataGrid control).
But, if you want to create 'real' Excel files, then you can either use Excel Interop (which could be messy, requires Excel and the PIA's to be installed on the machine, and needs careful memory management (since its COM)). You could also opt for a 3rd-party library like FlexCel - which will avoid many of the InterOp problems but will not give you 'complete' Excel functionality (addins, custom vba macros etc.). For most uses, however, a 3rd party library should do the trick.
Looks like there's another alternative called ExcelFormat. I didn't try it, though.

How can I search PDF?

Im doing a small project in C++ in LINUX PLATFORM.i need to search 10 or more PDF files and find required data.how can i do so?.
i will make my question more clear with following eg
Suppose i have ten text books all about c++ and i need info about the topic array. How i can search the pdf and find data?
Read this pdftotext
If you actually want to write code to do then you'll probably have to learn of to navigate the internals of a PDF file. There have been some answers on how to do that for example one pointing to this article which on the 2nd page has the code in C for a basic PDF parser
xtractpro