So I wanna make a PDF reader using C++ as a hobby project. The problem is I am not finding much of head start so if anyone has worked on similar project please guide me, a few web links would be great! I will be using windows environment and Visual studio.
If you want to simply "host" an existing PDF reader (such as Acrobat or Foxit) in your own window, then you'll want to look in to ActiveX.
Alternately, if you want to do your own PDF decoding, then the best place to start would be find a soft couch and cozy up with the PDF format specification, and in particular, ISO 32000-1. It's a real page-turner.
http://www.adobe.com/devnet/pdf/pdf_reference.html
Adobe's publication about the details of the PDF file format.
There are PDF components as well, if you want to go that route, but the majority of them are either not free, or already have a UI of their own. Just tossing a PDF component into a form doesn't strike me as much of a hobby project. :)
You might find this article on parsing Reg files using Boost Spirit a useful starter. I've used Spirit before for parsing complex data but I think you're biting off a mighty big challenge!
If you want to look at existing parsers, try PoDoFo in C++ or the lexing side of Panda, in C.
Related
I need to manipulate .docx documents using C/Visual C++. Any samples i found is c# samples.
How to do so?
What I've found is that Microsoft wants you to either use .NET or use their Office Automation API to invoke Word to perform the manipulations for you. Depending on how low you want to go with these manipulations, you might be able to get by with the Office Automation API. If not, you may have to get your hands dirty with the Open Office XML format that's behind the .docx file format.
Here's Microsoft's skimpy documentation on Office Automation
And here's an article that goes into it a bit more, although it may be out of date.
I just thought that one big issue with Office Automation is that you need to have Word to do anything with it. Of course, this all depends on what exactly you need to to.
Try http://libopc.codeplex.com/
I posted about this sort of earlier, but I am not sure how to post back to my original question as I can only comment or answer my own question.
Anyways, I need to get 4 links from a website, the latest stable build links for windows and linux, and the latest development build links for windows and linux (4 links total) within my C++ application.
I can download the page (http://www.sourcemod.net/snapshots.php) with LibCURL which is already implemented in the project, but after that I am not sure. I was looking at parsers, but I can't think of how I am going to discern link from link. Obviously using a parser I could get the first link from each table, but this does not seem efficient and would only provide me with the links to windows builds.
It looks like the links I need will be in the fourth in both tables, but I am just very familiar with a good way to go about this, so any help would be appreciated.
Maybe you'll find the location of the actual downloads, http://www.sourcemod.net/smdrop/, easier to parse.
I'm not too familiar with c++, but if you don't come across any better solutions there's BeautifulSoup for Python that is really nice for parsing Html and even deals with malformed documents well. And here's an highly rated CodeProject article on embedding Python in C/C++ that claims "This is written for programmers who are more experienced in C/C++ than in Python, the tutorial takes a practical approach and omits all theoretical discussions."
(I haven't read through it personally, as I mentioned, not terribly familiar with C++)
What solutions are there? I know only solutions for replacing Bookmarks in Word (.doc) files with Apache POI?
Are there also possibilities to change images, layouts, text-styles in .doc and .ppt documents?
I think about replacement of areas in Word and PowerPoint documents for bulk processing.
Platform: MS-Office 2003
What are your platform limitations?
Obviously Apache POI will get you at least part of the way there.
Microsoft's own COM API's are fairly powerful and are documented here. I would recommend using them if a) you are not running in a server (many users, multithreaded) environment; b) you can have a proper version of powerpoint installed on the production machine; and c) you can code against a COM object model.
It's a bit pricey, but Aspose.Slides is a very powerful library for manipulating PowerPoint files
If you include using other Office suits as an option, here's a list of possible solutions:
Apache POI-HSLF
PowerPoint 2007 APIs
OpenOffice.org UNO
Using POI you can't edit .pptx file format, but you don't depend on the apps installed on the system. Other two options, on the contrary, make use of other apps, but they are definitely better for dealing with presentations. OpenOffice has better compability with older formats, by the way. Also if you use UNO, you'll have a great choice of languages, UNO exists for Java, C++, Python and other languages.
My experience is not directly with Power Point, but I've actually rolled my own WordML (XML) generator. It a) removed all dependencies on Word, b) was very fast c) and let me build up documents from scratch.
But it was a lot of work to create. And I was only creating a write only implementation.
I'm not as familiar with Power Point, so this is conjecture, but you may be able to roll your own by reading XML (Power Point 2003??) and/or cracking the Office Open XML file (zipped XML), then using XPath to manipulate the data, and then saving everything back to disk.
This won't work on older OLE Compound Document based Power Point files though.
I've done something like that before: programmatically accessed and manipulated PowerPoint presentations. Back when I did it, it was all in C++ using COM, but similar principles apply to C#/VB .NET apps, since they do COM interop very easily.
What you're looking for is called the Office Document Model. Basically, Office applications expose their documents programmatically, as trees of objects that define their contents. These objects are accessible via an API, and you can manipulate them, add new ones, and do whatever other processing you want. It's exceedingly powerful; you can use it to manipulate pretty much all aspects of a document. But you'll need an installation of Office and Visual Studio to be able to use it.
Some links:
Intro: http://msdn.microsoft.com/en-us/library/d58327k6.aspx
Hope this helps!
Apparently new users can only include one link per posting. How lame! :)
Here's the other link I meant to include:
Example of manipulating PowerPoint documents programmatically: http://msdn.microsoft.com/en-us/library/cc668192.aspx
We're in the middle of deploying a new software system to lot's of users in lot's of places (200+ users over 8 countries). In the past we've written a manual for the users, then update it every so often. This works ok, in that all the users ahve the same manual and it covers the main things but it has it's problems, like it doesn't get updated that often, we sometimes miss updates, and some users will have old copies.
We've been talking about using a wiki during the testing and deployment phases to build a knowledge base about the system. Ideally we'd then like some way to convert that into some form fo electronic document that we can then 'pretty-fie' and send out as the official manual, as well as letting users use and update the wiki.
Has anyone else done anything similar ? Any suggestions for wiki systems, workflows, document formats etc?
Most wikis support export via PDF e.g.:
MediaWiki PDF Export
DokuWiki PDF Export
TWiki PDF Export
You can write something that generates LaTeX from the wiki and renders a manual to PDF. With packages like hyperref you can retain cross-references as hyperlinks.
Additionally, you can integrate content from multiple sources such as a data dictionary into the LaTeX document, which can be mixed and matched with the wiki content. You could also set the architecture up so it can support cross-referencing that goes either way.
Framemaker could also support this using generated MIF files, and you could also use Lout in a similar way or convert your wiki content to docbook, which would allow you to use any of the many rendering options available to that format.
As an aside, the following Stackoverflow postings discuss various systems for maintaining documentation.
Application (Not a Markup Language) for Producing a User Manual
Can LaTeX be used for producing any documentation that accompanies software?
What tools are used to write documentation?
What tools does your team use for writing user manuals?
How best to write documentation (ideally in latex) targeting both the web (html) and paper (pdf)?
Best tool(s) for working with DocBook XML documents?
What is the recommended toolchain for formatting XML DocBook?
Is a successor for TeX/LaTeX in sight?
Madcap Flare is a help-and-manual authoring tool that uses HTML for the source of each topic. You could pretty easily do a mass import of the Wiki pages. Would then require some cleaning but after that you have a nice single-source system that can output CHM, web-browsable help, PDF, DOC/DOCX, etc.
How are you storing the help source at the moment? Is it MS Word files, MS help, LaTeX?
If you put your help source files under version control then you will get all the benefits of a wiki without having to migrate to a new system - people can make edits to the help files easily - those changes can be tracked, reverted etc. and you get the prettified manuals as before.
I followed Node's links and came across some mediawiki pages that I thought were noteworthy.
Extension:OpenDocument Export
Extension:PDF Writer
Category:Data extraction extensions
I gave a previous answer which may be useful for the "wiki to PDF" part -- look at using the open source PediaPress code or functionality. You can get ODFs from it too, although their PDFs are already quite pretty (but you might want to rebrand it and restyle it for your company I suppose).
Does anyone know if it's possible to generate powerpoint ppts within ColdFusion? I can't rely on the approach of installing a copy of office and generate one through COM and I can't use ooxml since my client is still in the office 2003 era. Any suggestion is much appreciated.
You can try using Apache POI, specifically their Powerpoint support. Looks to be still in beta though:
http://poi.apache.org/slideshow/index.html
I've used POI to extra from Word docs before and it was rather easy in ColdFusion.
ColdFusion doesn't have built in PPT creation, but you may be able to make something work with OpenOffice.
Look into CFPresentation (CF8), it allows you to create web-based presentations - not actually PPT format, but displayed in the same way via Flash player.
Have you considered using PDF instead? For all intents and purposes except perhaps some animation, PDFs do well replacing PPTs. And CF has tons of PDF creation and manipulation features!
I know it's not a good answer, but ColdFusion 9 can turn a cfpresentation into a PowerPoint file, and creating a cfpresentation is pretty damned trivial...
However, this of course requires a server that's still in beta, and a large cash outlay once it's released if you're running your own server.
Dan