C++ library for creating pdfs with many languages supported? - c++

My dilemma is this: We have been using libharu for the creation of our pdfs but we recently added Hindi to our software and from what I can find, libharu doesn't support it.
I have looked around and have been unable to find a library similar to libharu (doesn't need to be open source) that supports all the languages we use, but I have failed.
I checked out all the libraries mentioned in this post, but none of them met my needs:
Open source PDF library for C/C++ application?
Also, that post is a few years old. >_<
So I ask you, kind stackoverflow people, do you know of a library for creating and editing pdfs (in c++) that supports at least the following languages? (English, Spanish, French, Turkish, German, Russian, Japanese, Chinese, Arabic, and Hindi)

Have you tried going to the source?
Adobe PDF Developer SDK
Adobe Systems owns invented the PDF language. I recommend talking to them first before going open source. They may have some libraries or SDKs to use with PDFs.

I have used JagPDF few times and from my last experience I think embedding fonts in the pdf might solve your problem.

Related

Creating a PDF reader in C++

So I wanna make a PDF reader using C++ as a hobby project. The problem is I am not finding much of head start so if anyone has worked on similar project please guide me, a few web links would be great! I will be using windows environment and Visual studio.
If you want to simply "host" an existing PDF reader (such as Acrobat or Foxit) in your own window, then you'll want to look in to ActiveX.
Alternately, if you want to do your own PDF decoding, then the best place to start would be find a soft couch and cozy up with the PDF format specification, and in particular, ISO 32000-1. It's a real page-turner.
http://www.adobe.com/devnet/pdf/pdf_reference.html
Adobe's publication about the details of the PDF file format.
There are PDF components as well, if you want to go that route, but the majority of them are either not free, or already have a UI of their own. Just tossing a PDF component into a form doesn't strike me as much of a hobby project. :)
You might find this article on parsing Reg files using Boost Spirit a useful starter. I've used Spirit before for parsing complex data but I think you're biting off a mighty big challenge!
If you want to look at existing parsers, try PoDoFo in C++ or the lexing side of Panda, in C.

Parsing HTML to find specific links (Without Keywords)

I posted about this sort of earlier, but I am not sure how to post back to my original question as I can only comment or answer my own question.
Anyways, I need to get 4 links from a website, the latest stable build links for windows and linux, and the latest development build links for windows and linux (4 links total) within my C++ application.
I can download the page (http://www.sourcemod.net/snapshots.php) with LibCURL which is already implemented in the project, but after that I am not sure. I was looking at parsers, but I can't think of how I am going to discern link from link. Obviously using a parser I could get the first link from each table, but this does not seem efficient and would only provide me with the links to windows builds.
It looks like the links I need will be in the fourth in both tables, but I am just very familiar with a good way to go about this, so any help would be appreciated.
Maybe you'll find the location of the actual downloads, http://www.sourcemod.net/smdrop/, easier to parse.
I'm not too familiar with c++, but if you don't come across any better solutions there's BeautifulSoup for Python that is really nice for parsing Html and even deals with malformed documents well. And here's an highly rated CodeProject article on embedding Python in C/C++ that claims "This is written for programmers who are more experienced in C/C++ than in Python, the tutorial takes a practical approach and omits all theoretical discussions."
(I haven't read through it personally, as I mentioned, not terribly familiar with C++)

How to replace text in a PowerPoint (.ppt) document?

What solutions are there? I know only solutions for replacing Bookmarks in Word (.doc) files with Apache POI?
Are there also possibilities to change images, layouts, text-styles in .doc and .ppt documents?
I think about replacement of areas in Word and PowerPoint documents for bulk processing.
Platform: MS-Office 2003
What are your platform limitations?
Obviously Apache POI will get you at least part of the way there.
Microsoft's own COM API's are fairly powerful and are documented here. I would recommend using them if a) you are not running in a server (many users, multithreaded) environment; b) you can have a proper version of powerpoint installed on the production machine; and c) you can code against a COM object model.
It's a bit pricey, but Aspose.Slides is a very powerful library for manipulating PowerPoint files
If you include using other Office suits as an option, here's a list of possible solutions:
Apache POI-HSLF
PowerPoint 2007 APIs
OpenOffice.org UNO
Using POI you can't edit .pptx file format, but you don't depend on the apps installed on the system. Other two options, on the contrary, make use of other apps, but they are definitely better for dealing with presentations. OpenOffice has better compability with older formats, by the way. Also if you use UNO, you'll have a great choice of languages, UNO exists for Java, C++, Python and other languages.
My experience is not directly with Power Point, but I've actually rolled my own WordML (XML) generator. It a) removed all dependencies on Word, b) was very fast c) and let me build up documents from scratch.
But it was a lot of work to create. And I was only creating a write only implementation.
I'm not as familiar with Power Point, so this is conjecture, but you may be able to roll your own by reading XML (Power Point 2003??) and/or cracking the Office Open XML file (zipped XML), then using XPath to manipulate the data, and then saving everything back to disk.
This won't work on older OLE Compound Document based Power Point files though.
I've done something like that before: programmatically accessed and manipulated PowerPoint presentations. Back when I did it, it was all in C++ using COM, but similar principles apply to C#/VB .NET apps, since they do COM interop very easily.
What you're looking for is called the Office Document Model. Basically, Office applications expose their documents programmatically, as trees of objects that define their contents. These objects are accessible via an API, and you can manipulate them, add new ones, and do whatever other processing you want. It's exceedingly powerful; you can use it to manipulate pretty much all aspects of a document. But you'll need an installation of Office and Visual Studio to be able to use it.
Some links:
Intro: http://msdn.microsoft.com/en-us/library/d58327k6.aspx
Hope this helps!
Apparently new users can only include one link per posting. How lame! :)
Here's the other link I meant to include:
Example of manipulating PowerPoint documents programmatically: http://msdn.microsoft.com/en-us/library/cc668192.aspx

Supporting multiple human languages

I am thinking about my final year project and the possibility of supporting multiple languages, e.g. English, Welsh, German etc..
Is there a standard way of supporting multiple human languages in a program?
What is the recommended file format for storing the different languages?
It is something I am clueless on but is obviously a very common feature, So any advice is welcomed.
I am most familiar with c++ using mfc for UI applications, currently learning Qt. So an answer with this bias in mind would be good.
(Sorry if this has been covered before, but searching for 'Languages' on SO returns streams of programming language related questions)
If you wanted to browse on StackOverflow for ideas you could try the internationalization, i18n, localization and l10n tags.
("i18n" == "internationalisation" because "nternationalizatio" is 18 letters. Same for localization and l10n.)
As for MFC you could use resource DLLs as described here. One of portable solutions will be using gettext library.
Apart from the already made suggestions of internationalization and localization, another term you might want to research is "Unicode".

Workflow to Turn Wiki content into a system manual

We're in the middle of deploying a new software system to lot's of users in lot's of places (200+ users over 8 countries). In the past we've written a manual for the users, then update it every so often. This works ok, in that all the users ahve the same manual and it covers the main things but it has it's problems, like it doesn't get updated that often, we sometimes miss updates, and some users will have old copies.
We've been talking about using a wiki during the testing and deployment phases to build a knowledge base about the system. Ideally we'd then like some way to convert that into some form fo electronic document that we can then 'pretty-fie' and send out as the official manual, as well as letting users use and update the wiki.
Has anyone else done anything similar ? Any suggestions for wiki systems, workflows, document formats etc?
Most wikis support export via PDF e.g.:
MediaWiki PDF Export
DokuWiki PDF Export
TWiki PDF Export
You can write something that generates LaTeX from the wiki and renders a manual to PDF. With packages like hyperref you can retain cross-references as hyperlinks.
Additionally, you can integrate content from multiple sources such as a data dictionary into the LaTeX document, which can be mixed and matched with the wiki content. You could also set the architecture up so it can support cross-referencing that goes either way.
Framemaker could also support this using generated MIF files, and you could also use Lout in a similar way or convert your wiki content to docbook, which would allow you to use any of the many rendering options available to that format.
As an aside, the following Stackoverflow postings discuss various systems for maintaining documentation.
Application (Not a Markup Language) for Producing a User Manual
Can LaTeX be used for producing any documentation that accompanies software?
What tools are used to write documentation?
What tools does your team use for writing user manuals?
How best to write documentation (ideally in latex) targeting both the web (html) and paper (pdf)?
Best tool(s) for working with DocBook XML documents?
What is the recommended toolchain for formatting XML DocBook?
Is a successor for TeX/LaTeX in sight?
Madcap Flare is a help-and-manual authoring tool that uses HTML for the source of each topic. You could pretty easily do a mass import of the Wiki pages. Would then require some cleaning but after that you have a nice single-source system that can output CHM, web-browsable help, PDF, DOC/DOCX, etc.
How are you storing the help source at the moment? Is it MS Word files, MS help, LaTeX?
If you put your help source files under version control then you will get all the benefits of a wiki without having to migrate to a new system - people can make edits to the help files easily - those changes can be tracked, reverted etc. and you get the prettified manuals as before.
I followed Node's links and came across some mediawiki pages that I thought were noteworthy.
Extension:OpenDocument Export
Extension:PDF Writer
Category:Data extraction extensions
I gave a previous answer which may be useful for the "wiki to PDF" part -- look at using the open source PediaPress code or functionality. You can get ODFs from it too, although their PDFs are already quite pretty (but you might want to rebrand it and restyle it for your company I suppose).