This cannot be so hard, but I simply don't manage. Neither google nor stackoverflow or the documentation of ubuntu or ghostscript were helpful.
I am generating postscript from C++. I place the text word by word to handle line-wrap. For deciding where to place the next word and whether it fits into the current line I rely on freetype to measure the "advance" of each glyph.
The text is a mix of normal text and source code, so I have two fonts involved. I chose Helvetica for normal text and Courier for source code, since both are readily available in postscript and don't need to be embedded. The problematic part of my postscript output is not significantly more complicated than
(Helvetica) findfont 11 scalefont setfont
40 100 moveto (hello world) show
123 100 moveto (hello again) show % I care for the first number
Of course, there is a proper eps header etc.
I did not manage to locate the font files on my ubuntu 16.04 system, so I downloaded best guesses from free font websites. It turns out that they apparently differ from those used by my postscript interpreter. At least, after converting to a PDF with epstopdf (which comes with LaTeX as far as I know), I see that my Helvetica font is too wide and my Courier font is too narrow, so that word spacing is off, up to the point that long words overlap with the subsequent word.
My question: how can I get font width measurements matching those of the postscript interpreter?
I am not even sure whether the question is well-posed, but somehow I do assume that there is one and only one reference Helvetica font, so that postscript output looks the same on all systems and printers.
Making freetype load the correct fonts would probably be the easiest solution, but I do not know how to find the files.
A source for downloading the exactly matching fonts would also solve the problem, although having them twice would be odd.
Even better, asking a postscript interpreter like ghostscript for the ground truth would be preferable, but the ghostscript library documentation is sparse and I did not find any examples.
I could create a postscript file that prints the width of the text obtained with textwidth, convert to a pdf, and extract the text. That would be ugly and slow, and I'd like to go for a proper C++ solution.
Progress in any of these or maybe other directions would be absolutely great!
The fonts you are using should have a .afm (Adobe Font Metrics) file, which you can read the font metrics from if its a PostScript font. Its also true that the 'base 13' fonts should be the same in terms of metrics across all PostScript implementations. Of course, if you are using a TrueType font to get the metrics from then they may well differ from a PostScript font.
You haven't said what PostScript interpreter you are using, it may be that its not using a standard font, but my guess is you are using a TrueType font from your Ubuntu which doesn't quite match the PostScript ones you are using in your 'interpreter'. If memory serves you can look in /etc/fonts/fonts.conf to see where your fonts are stored.
FWIW Ghostscript ships with implementations of the base 13 fonts which are matched to the Adobe fonts, PostScript interpreters should match those. We don't however ship the AFM files, but you can load the fonts into Fontographer, or use FreeType, or simply get the advance width by using stringwidth (not textwidth) in a PostScript program.
I wouldn't have said Ghostscript's docuemntation is 'sparse'. Difficult to find what you want, maybe, but there's lots of documentation there. Just use.htm, the basic information, is a 265Kb HTML file.
The final alternative, of course, is to download the fonts you are actually using in the PostScript program, then you know that they match the metrics you used to create the PostScript in the first place. As with PDF, this is highly recommended, especially for fonts outside the base 13, as its the only way to get reliable output.
Related
I'm developing a program to print images of different formats (BMP, JPEG, EMF, ...) on HDC using C++ and Windows GDI+. Using the MS Publisher Imagesetter driver I can generate a postscript file and through GhostScript functions I obtain the PDF file. If I try to print the following image:
I obtain the following bad quality result with those strange squares (not present on original image):
The part of my code that I used to print the image is:
SetMapMode(hdcPrint,MM_TEXT);
Gdiplus::Graphics graphics(hdcPrint);
graphics.SetPageUnit(Gdiplus::UnitMillimeter);
Gdiplus::Image* image = Gdiplus::Image::FromFile(srPicture->swPathImage);
graphics.DrawImage(image,x,y,w,h);
I tried to print the same image with many drivers and different kind of formats (different from PostScript: PDF, EMF, real printer) and the result is always acceptable (the squares are not present).
Furthermore, I tried to open the bad quality result with a pdf reader different from Adobe Acrobat Reader DC (Wondershare PDFelement and Chrome) and, even then, the result is acceptable.
I also noticed that if the image contains some different shapes (i.e. a big red line, like in the next image) the result is good too.
At this point, I have no idea if the problem is Adobe reader or my implementation.
Is there a differnt way to print different formats images with GDI+ (or pure GDI)?
The PostScript file generated is this.
Well... You haven't supplied either the PostScript or PDF files, which makes it really hard to comment.
Its not completely obvious to me at what point you are getting the image you show, is this what you see on the PDF file ? Is it something you are getting when printing the PDF file to a physical printer ? If its the lattter, how are you printing the PDF file to the printer ?
The JPEG you have supplied a link to is really small (6Kb), are you genuinely trying to use that JPEG file ?
My guess (and in the absence of any files, a guess is all it can be) that you are using an old version of Ghostscript. Old versions would decompress the JPEG image, then recompress the image using whatever filter produced the smallest result, usually JPEG again.
Because JPEG is a lossy format, every time you apply it to an image the quality decreases.
Newer versions of Ghostscript don't decompress the JPEG image data when going to the pdfwrite device, unless other optsions (eg Colour conversion, image downsampling etc) make it neccesary. The current version of Ghostscript is 9.27 and the release of 9.28 is imminent, I'd suggest you try one of those.
Another possibility would be that either the PostScript program has been created in such a way as to degenerate every image smaple to a rectangle, or you are using an extremely old version of Ghostscript where that technique was also used.
Note that none of these would, in my opinion, lead to exactly the result you've pasted here, but the version is certainly worth investigating. Posting the PostScript program file (ie the file you send to Ghostscript) would be more helpful, because it would allow me to at least narrow down where the problem has occured.
[EDIT]
The fault appears to be an intriguing bug in Acrobat.
The PostScript program uses a colour transfer function to invert the colour samples of the RGB JPEG image. (this is a frowned upon practice, its not what transfer functions are for, but its not uncommon). Ghostscript's pdfwrite device preserves the transfer function.
When rendered Ghostscript correctly produces the expected result, Acrobat, however, spectacularly does not, I have no idea what kind of mess they've made which leads to the result you get but its clearly wrong.
If I alter Ghostscript's pdfwrite production settings to Apply transfer functions instead of preserving them:
-c "<</TransferFunctionInfo /Apply>> setdistillerparams" -f PostScript.ps
then the resulting file views correctly in Acrobat. If I modify Adobe Acorbat's settings so that it uses Preserve instead of Apply for transfer functions (look in Settings->Edit Adobe PDF Settings, then the Color tab, and at 'when transfer functions are found', set the drop-down to Preserve instead of Apply) the resulting PDF file renders correctly in Ghostscript, and the same kind of incorrectly in Acrobat as the Ghostscript pdfwrite output file.
In short I'm afraid what you are seeing here is an Acrobat rendering bug, you can work around it by altering the Ghostscript transfer function settings as above but its really not a problem in Ghostscript.
I'm using Pango for text layouting without the cairo backend (currently testing with the win32 backend). And I like to know if pango is capable of a flow layout around an image, or any given container. Or maybe inside a custom container.
Something like this: Flow around image
I have checked many examples and the Pango API and didn't found such a feature. Maybe I'm missing something or Pango does not have this feature.
As I said in this answer, you can't. I went through the source code Pango graphics handling is primitive to the point of uselessness. Unless there's been some major reworking in the past year, which the release notes don't indicate, it's probably the same now.
The image you provide as an example is only available as PDF at the moment which requires every line, word and glyph be hard-positioned on the page. While theoretically possible to check the alpha channel of the image to wrap the text around the actual image instead of the block it contains, this has not (to the best of my knowledge) ever been implemented in a dynamic output system.
Pango, specifically, cannot even open "holes" in the text for graphics to be added later and, at the code level, doesn't even have the concept of a multi-line cell - hence a line being the size of its largest component.
Your best bet is to look at WebKit for more complex displays. I, for one, have pretty much given up on Pango and it seems to be getting less popular.
I've read Rendering Vector Art on the GPU on rendering shapes that are defined by quadratic/cubic Bezier curve boundaries. I was hoping to build off of this to create text that fills in as if it were stroked by a pen or brush somehow. (Any advice on how to do this is welcome.)
However, I'm a little unsure of where to get my hands on fonts / shapes that have the format specified in this paper (arrays of points representing quadratic/cubic Beziers).
Does anyone know of a way of getting font/vector drawings that are in this format? The authors of the paper mention truetype fonts, but according to
TrueType Font Parsing in C
it looks like parsing truetype fonts might involve a lot more than this? I know there are also formats like .svg, but I am not sure where to start with that, since it holds so much more than what I am looking to get out of it.
As an example, is there some type of file format that I could convert a .svg file or truetype file to, perhaps by using something inkscape's export function, such that the resulting format would be possible to parse for an array of points and control points?
I accepted an answer below, but for anyone interested in this, you should check out
https://github.com/quarnster/TTF
It's pretty much exactly what I was looking for. The code works great, but it's a bit hard to understand how to use it. It makes more sense if you read about the TTF format, like here An Introduction to TrueType Fonts: A look inside the TTF format.
I suggest using the cross platform library FreeType (http://www.freetype.org/). FreeType loads font files and, among other things, provides the bounding curves of glyphs in the typeface. Specifically, you should look into the function FT_Outline_Decompose, which gives exactly what you want.
Is there a C++ library that takes a string and a font file and returns the pixel representation of that string using that font? For example, I wrote a short PHP script that draws a single letter using Courier and then pulls out the individual pixels:
I can convert that to an array of color intensity codes and use it, but it means I need to hardcode every character I want to use, in every font, and I lose things like ligatures and intelligent kerning that only come up when multiple characters are drawn together. Is there a way in C++ to just do this directly, given the TTF file for the font I want to use? I'm using Linux, so I can't depend on Windows API functions like GetGlyphOutline
Is there a C++ library that takes a string and a font file
There's C library that can be used to do the same thing. It is called Freetype2. It is relatively easy to use. If you want to keep things portable and relatively lightweight, using Freetype2 is the way to do it. On linux system it is probably already installed.
Also, cross-platform GUI toolkit normally provide some kind of "font" class that can be used to do what you want. For example, in Qt 4 you could use QFont to draw text on Qimage and then extract individual pixels, plus operatign systems (ones that have concept of "font") normally provide some kind of font manipulation API as well.
Does anyone know of a c++ library for taking an image and performing image recognition on it such that it can find letters based on a given font and/or font height? Even one that doesn't let you select a font would be nice (eg: readLetters(Image image).
I've been looking into this a lot lately. Your best is simply Tesseract. If you need layout analysis on top of the OCR than go with Ocropus (which in turn uses Tesseract to do the OCR). Layout analysis refers to being able to detect position of text on the image and do things like line segmentation, block segmentation, etc.
I've found some really good tips through experimentation with Tesseract that are worth sharing. Basically I had to do a lot of preprocessing for the image.
Upsize/Downsize your input image to 300 dpi.
Remove color from the image. Grey scale is good. I actually used a dither threshold and made my input black and white.
Cut out unnecessary junk from your image.
For all three above I used netbpm (a set of image manipulation tools for unix) to get to point where I was getting pretty much 100 percent accuracy for what I needed.
If you have a highly customized font and go with tesseract alone you have to "Train" the system -- basically you have to feed a bunch of training data. This is well documented on the tesseract-ocr site. You essentially create a new "language" for your font and pass it in with the -l parameter.
The other training mechanism I found was with Ocropus using nueral net (bpnet) training. It requires a lot of input data to build a good statistical model.
In terms of invoking Tesseract/Ocropus are both C++. It won't be as simple as ReadLines(Image) but there is an API you can check out. You can also invoke via command line.
While I cannot recommend one in particular, the term you are looking for is OCR (Optical Character Recognition).
There is tesseract-ocr which is a professional library to do this.
From there web site
The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available
I think what you want is Conjecture. Used to be the libgocr project. I haven't used it for a few years but it used to be very reliable if you set up a key.
The Tesseract OCR library gives pretty accurate results, its a C and C++ library.
My initial results were around 80% accurate, but applying pre-processing on the images before supplying in for OCR the results were around 95% accurate.
What is pre-preprocessing:
1) Binarize the bitmap (B&W worked better for me). How it could be done
2) Resampling your image to 300 dpi
3) Save your image in a lossless format, such as LZW TIFF or CCITT Group 4 TIFF.