Use PDF Core Fonts in wkhtmltopdf/QPrinter - c++

Questions in a nutshell:
Is it possible to force wkhtmltopdf to use Type1 fonts (PDF core fonts) when generating the pdf?
or: Is it possible to force the Qt QPdfEngine to use Type1 fonts (PDF core fonts) when generating the pdf?
Detailed description of the problem.
I have developed a webapp to do a pdf-export of a very large database using wkhtmltopdf & tcpdf. The (900+) pages are "printed" using wkhtmltopdf and then "glued together" using tcpdf & fpdi.
Unfortunately wkhtmltopdf seems to always embed the fonts it uses to render the pdf. This is very unwanted behavior as this is redundant and bloats the document. Additionally the document needs to be editable in Acrobat Pro. Apparently Acrobat Pro seems to be overwhelmed by embedded fonts. It tries to replace them upon saving the document, and hangs itself after ~45 minutes on windows machines (on Mac machines it just takes incredibly long to save the document).
So: is there a possibility to tell wkhtmltopdf not to embed fonts and use Type1 Fonts (Helvetica) instead?
I couldn't find any switch to do this so i assumed patching wkhtmltopdf would do the trick by using: printer->setUseEmbeddedFonts(false)
Unfortunately this didn't change anything.
My next idea was patching the Qt-PDF-Printer. Looking into the QPrinter-Class and the QPrintEngine I didn't find any place where the pdf engine uses Type1 fonts instead of embedding the TTF-Fonts (or whatever font is used).
Any ideas and/or pointers?

Related

Must I use path to fonts?

The FT_New_Face function seems to be the one I'm looking for, but it requires a path to the font file. I would like to open a font like "Times New Roman," without supplying a path. How can I do that?
Most unix-based systems use Fontconfig for this to get best matching font file from set of search parameters ( family name, variations, weight etc )
Fontconfig is a library for configuring and customizing font access.
Fontconfig can:
discover new fonts when installed automatically, removing a common
source of configuration problems.
perform font name substitution, so that appropriate alternative fonts can be selected if fonts are missing.
identify the set of fonts required to completely cover a set
of languages.
have GUI configuration tools built as it uses an XML-based configuration file (though with autodiscovery, we believe
this need is minimized).
efficiently and quickly find the fonts you
need among the set of fonts you have installed, even if you have
installed thousands of fonts, while minimzing memory usage.
be used in concert with the X Render Extension and FreeType to implement high quality, anti-aliased and subpixel rendered text on a display.
Fontconfig does not:
render the fonts themselves (this is left to FreeType or other
rendering mechanisms)
depend on the X Window System in any fashion, so
that printer only applications do not have such dependencies
Fontconfig is relatively portable and used on a variety of systems, however OSX has CoreText which has similar functionality and Windows has DirectWrite
Refer to this question for help on how to use Fontconfig.

Converting HTML file to PDF using Win32/MFC

As part of my application, my client has requested that I include an automated e-mailing system. As part of this system, I generate HTML code and use automation to send it via. Outlook.
However, they also require a PDF copy of the HTML document to be sent as an attachment. My initial attempts involved using libHaru, which proved difficult to use efficiently, as I was required to create the PDF document from scratch, which required computation of the position of each of the lines in a table, and positioning of all the text, etc.
I was wondering if there would be a way to programmatically convert HTML code (or an HTML file if need be) into a PDF document either by using Win32/MFC itself or an external library.
Thanks in advance!
EDIT: Just to clarify, I am looking for solutions which minimize external dependencies.
You should evaluate this utility wkhtmltopdf:
http://code.google.com/p/wkhtmltopdf/
You can call it from the command line without the need to run a setup.
I use it generating my output documents as html then cal a ShellExecute(...) to convert it to PDF. It's great!
Inside uses webkit + qt. So compability with modern HTML is OK.
Hope it helps.
I'd take a look at PDF Creator, which can be used as a COM object (that acts pretty much like a printer). I haven't used it to print HTML, so I'm not sure, but my guess is that you'll probably end up having to instantiate a web browser control to render the HTML, and then feed it from there to the PDF control.
Some possible answers are in this thread:
C++ Library to Convert HTML to PDF?
Not sure if they will satisfy your particular requirements, but these might at least get you started.
Edit:
Some other possible options here.
Not MFC but you can try QtWebKit. It can render and export HTML to PDF, PNG, JPEG

How to open and display a PDF file using Qt/C++?

I am trying to open and read a PDF file using Qt, but there is no specific way to do that.
I know the subject is a bit old, but...
I found a really simple way to render PDFs in Qt via QtWebKit using pdf.js (http://mozilla.github.com/pdf.js/).
Here is my realization of the idea for Qt5 and the WebEngine: https://github.com/Archie3d/qpdf
Qt itself does not include PDF reading/rendering functionality as far as I know. You might want to have a look at libpoppler which has Qt bindings.
I found this very interesting article on qt-project.org - "Handling PDF - Qt Project".
This page discusses various available options for working with PDF documents in a Qt application. The page does not exactly show how to "open and display an existing PDF document" but it can help you deduce something useful out of all that is explained there.
Here, the page says:
For rendering pages or elements from existing PDF documents to image
files or in-memory pixmaps (useful e.g. for thumbnail generation or
implementing custom viewers), third-party libraries can be used (for
example: poppler-qt4 (freedesktop.org) and muPDF (mupdf.com)).
Alternatively, the task can be delegated to existing command-line
tools (like poppler-utils (freedesktop.org) and muPDF (mupdf.com)).
You can use PdfViewer which is a lightweight PDF viewer that only uses Qt. It contains a PdfView widget which can be easily embedded in your application.
Simple answer : it is not supported in the Qt API.
Other answer : you can code it, I suggest you have a look at this Qt application which uses Ghostscript
The best way I have found to open a pdf is using QProcess in Qt.
You may want to use okular for pdf proccessing.
I know this is an old post, but I stumbled on it during my initial search so I figured I would post some documentation from the solutions I used.
As of Qt 5.10
Check out the QPdfDocument Class. This class can open a PDF and you can use the render function to render a page to an image. I use the QQuickPaintedItem to then "draw" this image but I am sure there are more ways to handle the QImage output.
Prior to Qt 5.10
I used libpoppler to do a VERY similar process.
#include <poppler/qt5/poppler-qt5.h>
Use the Poppler::Document Class to load and handle the entire PDF document and look at the Poppler::Page::renderToImage function to output the page as a QImage.
Qt does not support reading PDF files out of the box and among many approaches you can use Adobe's PDF Reader ActiveX object along with a QAxObject.
You may want to check out this link which describes how to read PDF files in Qt/C++ using ActiveX and has a downloadable example project.

Is there such a thing like a Printer-Markup-Language

I like to print a document. The content of the document are tables and text with different colors. Does a lightwight printer-file-format exist, which can be used like a template?
PS, PDF, DOC files in my opinion are to heavy to parse. May there exist some XML or YAML file format which supports:
Easy creation (maybe with a WYSIWYG-Editor)
Parsing and manipulation with Library-Support
Easy sending to the printer (maybe with Library-Support)
Or do I have to do it the usual way and paint within a CDC?
I noticed you’re using MFC (so, Windows). In that case the answer is a qualified yes. In recent versions of Windows, Microsoft offers the XPS Document API which lets you create and manipulate a PDF-like document using XML, which can then be printed using the XPS Print API.
(For earlier versions of Windows that don’t support this API, you could try to deal with the XPS file format directly, but that is probably a lot harder than using CDC. Even with the API you will be working at a fairly low level.)
End users can generate XPS documents using the XPS print driver that is available for free from Microsoft (and bundled with certain MS products—they probably already have it on their system).
There is no universal language that is supported across all (or even many) printers. While PCL and PS are the most used, there are also printers which only work with specific printer drivers because they only support a proprietary data format (often pre-rendered on the client).
However, you could use XSL-FO to create documents which can then be rendered to a printer driver using library support.
I think something like TeX or LaTeX (or even troff or groff) may meet your needs. Google them and see.
There are also libraries to render documents for print from HTML source. Look at http://libharu.sourceforge.net/ for example. This outputs a printer-ready .PDF
A think that Post Script is a really good choice for that.
It is actually a very simple language, and it must be very easy to parse becuse it is stack-oriented. Then -- most printers supprort it, and even if you have no support you can use GhostScript to convert for many different formats (Consider GS as a "virtual PS supporting printer").
Finally there are a lot of books and tutorials for the language.
About the parsing -- you can actually define new variables and functions in PS. So, maybe, your problem can be solved (almost) entirely using PS.
HTML + CSS can be printed -- properly. CSS was designed to support this with the media attribute to specify that your CSS is for printer layout, not for screen layout. Tools like PRINCE (free + commercial versions) exist to render this for printing.
I think postscript is the markup language used by printers. I read this somewhere, so correct me if postscript is now outdated.
http://en.wikipedia.org/wiki/PostScript
For more powerful suite you can use Latex. It will give options of creating templates where you can just copy the text.
On a more GUI friendly note, MS-Word and other word processors have templates. The issue is they are not of a common standard or markup.
You can also use HTML to render stuff in a common markup but it will not be very printer friendly.

Dealing with obsolete versions of RTF

Summary questions:
Do you know of a lightweight application that can save files in RTF Version 1.6 format?
Do you know what version of RTF Abiword's "Rich Text Format for old apps" corresponds to?
Do you know a way to inspect an RTF file and determine what version of RTF it's encoded under?
Do you know which DLL describes the RTF format on a Windows NT 4.0 machine and whether it can be upgraded?
I have a legacy MS Visual C++ 6.0 MFC application that runs on an embedded Windows NT 4.0 machine. The application provides in-app help using MFC's CRichEditView class to pull text out of an RTF file called help.rtf. The help file is saved as RTF version 1.6. It has always been edited using MS Word 2000 or the version of WordPad that comes with Windows NT 4.0.
The problem is that our developer workstations tend to have Windows XP (and its version of WordPad) and Office 2003 or better, both of which use more recent versions of RTF than 1.6, and it is becoming increasingly cumbersome to find a machine on which the file can be edited and re-saved in that obsolete format. If a newer version of Word or WordPad is used to save the file, it gets saved as a newer version of RTF. Then, when the application is run on the NT machine, the help text doesn't display properly. (Although when the same application is run on an XP machine, the help text does display properly.)
So, I'm looking to do one of two things:
Find an application (preferably lighter-weight than Word 2000) that will save files in RTF version 1.6 format, that we can use for future editing of the help file.
Figure out a way to get the NT machine to read later versions of RTF properly.
On the first front, I've tried AbiWord, which has a "Rich Text Format for old apps" option, but I can't tell what version of RTF this option outputs. Do you know what version this is? Unfortunately, it's not readily apparent from the metadata in the file, which just says "rtf1", per this cute passage from all versions of the RTF spec. Is there a way to analyze an RTF file and determine what version of RTF it's encoded under?
The RTF standard described in this RTF Specification, although titled as version 1.6, continues to correspond syntactically to RTF Specification version 1. Therefore, the numeric parameter N for the \rtf control word should still be emitted as 1.
On the second front, I'm wondering if there's some DLL that I can just update so that Windows NT will recognize the newer version of the format. Do you know which DLL describes the RTF format and whether it can be upgraded?
I believe the rich edit format is determined by the rich edit control itself. I wouldn't try to upgrade the DLL, because there's a lot that could break.
See this MSDN note for hints on using the later version of the rich edit control. Version 2.0 should be available in NT 4.0.
http://msdn.microsoft.com/en-us/library/tt1cfb9f(VS.80).aspx
You might try copying the version of WordPad from your NT system and see if that works as an alternative.
Following a chain of hints that started with Mark Ransom's answer, I ended up copying riched20.dll and riched32.dll from C:\Windows\System32\ on my XP machine to C:\WinNT\System32\ on the NT machine. After I did this, RTF files edited with WordPad or Word on the XP machine rendered correctly on both WordPad and my application on the NT machine.
First thing that comes to mind is WordPad. It's on every machine and is really lightweight in it's RTF. I've found it much better than Word at many simple RTF tasks.