How to replace text in a PowerPoint (.ppt) document? - replace

What solutions are there? I know only solutions for replacing Bookmarks in Word (.doc) files with Apache POI?
Are there also possibilities to change images, layouts, text-styles in .doc and .ppt documents?
I think about replacement of areas in Word and PowerPoint documents for bulk processing.
Platform: MS-Office 2003

What are your platform limitations?
Obviously Apache POI will get you at least part of the way there.
Microsoft's own COM API's are fairly powerful and are documented here. I would recommend using them if a) you are not running in a server (many users, multithreaded) environment; b) you can have a proper version of powerpoint installed on the production machine; and c) you can code against a COM object model.

It's a bit pricey, but Aspose.Slides is a very powerful library for manipulating PowerPoint files

If you include using other Office suits as an option, here's a list of possible solutions:
Apache POI-HSLF
PowerPoint 2007 APIs
OpenOffice.org UNO
Using POI you can't edit .pptx file format, but you don't depend on the apps installed on the system. Other two options, on the contrary, make use of other apps, but they are definitely better for dealing with presentations. OpenOffice has better compability with older formats, by the way. Also if you use UNO, you'll have a great choice of languages, UNO exists for Java, C++, Python and other languages.

My experience is not directly with Power Point, but I've actually rolled my own WordML (XML) generator. It a) removed all dependencies on Word, b) was very fast c) and let me build up documents from scratch.
But it was a lot of work to create. And I was only creating a write only implementation.
I'm not as familiar with Power Point, so this is conjecture, but you may be able to roll your own by reading XML (Power Point 2003??) and/or cracking the Office Open XML file (zipped XML), then using XPath to manipulate the data, and then saving everything back to disk.
This won't work on older OLE Compound Document based Power Point files though.

I've done something like that before: programmatically accessed and manipulated PowerPoint presentations. Back when I did it, it was all in C++ using COM, but similar principles apply to C#/VB .NET apps, since they do COM interop very easily.
What you're looking for is called the Office Document Model. Basically, Office applications expose their documents programmatically, as trees of objects that define their contents. These objects are accessible via an API, and you can manipulate them, add new ones, and do whatever other processing you want. It's exceedingly powerful; you can use it to manipulate pretty much all aspects of a document. But you'll need an installation of Office and Visual Studio to be able to use it.
Some links:
Intro: http://msdn.microsoft.com/en-us/library/d58327k6.aspx
Hope this helps!

Apparently new users can only include one link per posting. How lame! :)
Here's the other link I meant to include:
Example of manipulating PowerPoint documents programmatically: http://msdn.microsoft.com/en-us/library/cc668192.aspx

Related

Web service for converting MS Office file formats (doc, docx, ppt, etc) into plain text?

Larger context: we're working on an Intranet portal's search engine, which needs to be able to search within ALL office types: doc, docx, xls,xlsx, ppt, and pptx. Having the search algo already in place, we've implemented the indexer using Office automation; however, client is concerned, that this is 1, error-prone, and 2, not recommended by Microsoft (and also -not covered in their license).
I've read the previous answers in this regard on SO, however it would require us to integrate an extremely large amount of distinct libraries to cover all the edges, which we don't have the resource to do so.
Hence, we're looking for a simple web service, to which we can submit any of these documents, and would return a simple, plain text (or html, or even PDF -we've got parsers for both) output.
Are there any such services (free, or paid), that covers all of the file formats above?
Many thanks.
I would suggest to try Apache Tika - it's free and open source. It allows to extract text contents from MS Office file formats (and from other popular formats, too). There is a server application included which you can run on your own server.
I'm note sure about the service, however if you can managed and deploy three .NET assemblies for DOC/DOCX, XLS/XLSX, and PPT/PPTX. Then you may try Aspose components -- Aspose.Words, Aspose.Cells, and Aspose.Slides respectively. These DLLs don't require MS Office to be installed on your server and they work fine on any Windows OS and on 32-bit/64-bit environments. You may also see the documentation. These components provide many advanced features to deal with document elements as well. Please see if this might help in your scenario.
Disclosure: I work as developer evangelist at Aspose.

File preview component (C++/MFC)

Is anyone aware of a good, general purpose file preview component for MFC/C++ desktop applications?
Specifically, I'm looking for a component that I could embed in my application that would allow a broad range of file types (text files, multimedia, etc.) to be previewed without the need for original applications (such as MS Word, etc.) to be installed.
I could only find one, via Google:
http://www.file-viewer-sdk.com/
Unfortunately, these folks want $60k for unlimited redistribution, which is outside of our budget.
Anyone have any recommendations? If not a component, is anyone using another general-purpose strategy that works well for them?
You can write your own shell preview host once you know the interfaces.
You might want to check out Autovue, originally made by Cimmetry since acquired by Oracle
.
Our product makes limited use of their SDK to do some document conversions (Mostly RTF->PS) and that works well enough for us.

best way to programmatically modify excel spreadsheets

I'm looking for a library that will allow me to programatically modify Excel files to add data to certain cells. My current idea is to use named ranges to determine where to insert the new data (essentially a range of 1x1), then update the named ranges to point at the data. The existing application this is going to integrate with is written entirely in C++, so I'm ideally looking for a C++ solution (hence why this thread is of limited usefulness). If all else fails, I'll go with a .NET solution if there is some way of linking it against our C++ app.
An ideal solution would be open source, but none of the ones I've seen so far (MyXls and XLSSTREAM) seem up to the challenge. I like the looks of Aspose.Cells, but it's for .NET or Java, not C++ (and costs money). I need to support all Excel formats from 97 through the present, including the XLSX and XLSB formats. Ideally, it would also support formats such as OpenOffice, and (for output) PDF and HTML.
Some use-cases I need to support:
reading and modifying any cell in the spreadsheet, including formulas
creating, reading, modifying named ranges (the ranges themselves, not just the cells)
copying formatting from a cell to a bunch of others (including conditional formatting) -- we'll use one cell as a template for all the others we fill in with data.
Any help you can give me finding an appropriate library would be great. I'd also like to hear some testimonials about the various suggestions (including the ones in my post) so I can make more informed decisions -- what's easy to use, bug-free, cheap, etc?
The safest suggestion is to just use OLE. It uses the COM, which does not require .NET at all.
http://en.wikipedia.org/wiki/OLE_Automation <--about halfway down is a C++ example.
You may have to wrap a few functionalities into functions for usability, but it's really not ugly to work with.
EDIT: Just be aware that you need a copy of Excel for it to work. Also, there's some first-party .h files that you can find specific to excel. (it's all explained in the Wikipedia article)
I don't know if this is an option for you, but the new office 2007 formats are in zipped XML format, which makes it very doable to do your own modifications. See here for the specifications.
SpreadsheetGear for .NET will handle your requirements and has an API which is very similar to Excel.
When you insert cells, your defined names (and any other formulas / charts / etc...) will automatically be fixed up to reference the new range (just as they would in Excel). So you would not need to update your defined names (although there is complete support for creating and updating defined names if that is what you want to do).
SpreadsheetGear is a .NET component, but you can build your own wrapper which is callable from C++.
You can see what our customers say and download the free, fully functional evalution here.
Have you already tried using the Excel COM interfaces? Obviously Excel needs to be install on the machine, and it's a pain to deal with...
I would argue that a .net solution with COM interop for linking into your C++ application is the best solution. In more than ten years of working with them, I've never seen a COM automation of Excel that didn't leak memory somewhere.
If you need to automate Excel, I recommend Visual Studio Tools for Office. If you don't need to automate, only modify files and those files can be in Office 2007 format, you're better off finding a library that manipulates the files directly instead of opening Excel to do it.
I ended up using Aspose.Cells as I mentioned in my original post, since it seemed like the easiest path. I'm very happy with the way it turned out, and their support is very good. I had to create a wrapper around it in C# that exported a COM interface to my C++ application.

Enterprise-grade template printing system

I'm looking for an enterprise-grade template printing system. I'm interested in every software I can get my hands on to evaluate. Commercial or not.
What I need - a separate system ready to receive tags in order to print (digital or paper) a template (like a contract, invoice, etc). Templates should be managed by the same software. It should operate via web services or via enterprise bus (preferable JMS or MQSeries connectors).
Can I ask for some names and possibly some URLs? Anything will be helpful even if it does not fit the requirements exactly.
Thanks.
This is an old question, but for the Googlers out there, we use a couple of products to render documents in XSL-FO (a W3C standard paper specification that we generate using XSL) either to PDF, PostScript, etc. We use it to show documents online as well as bulk print a few hundred thousand of them monthly.
RenderX (.NET, Java, whatever)
provides a very powerful solution for
our bulk printing needs
IBEX PDF Creator (.NET
only) for online rendering to PDF
Calligo is a commercial package from InSystems. Can't reach the web site right now; could be a bad sign.
Then there are these open source possibilities.

Generating Powerpoint PPT with ColdFusion?

Does anyone know if it's possible to generate powerpoint ppts within ColdFusion? I can't rely on the approach of installing a copy of office and generate one through COM and I can't use ooxml since my client is still in the office 2003 era. Any suggestion is much appreciated.
You can try using Apache POI, specifically their Powerpoint support. Looks to be still in beta though:
http://poi.apache.org/slideshow/index.html
I've used POI to extra from Word docs before and it was rather easy in ColdFusion.
ColdFusion doesn't have built in PPT creation, but you may be able to make something work with OpenOffice.
Look into CFPresentation (CF8), it allows you to create web-based presentations - not actually PPT format, but displayed in the same way via Flash player.
Have you considered using PDF instead? For all intents and purposes except perhaps some animation, PDFs do well replacing PPTs. And CF has tons of PDF creation and manipulation features!
I know it's not a good answer, but ColdFusion 9 can turn a cfpresentation into a PowerPoint file, and creating a cfpresentation is pretty damned trivial...
However, this of course requires a server that's still in beta, and a large cash outlay once it's released if you're running your own server.
Dan