Apache POI - word file to msxml - xslt

I have done a bunch of searching and have not found a simple answer to my question.
MS word allows you to Save As, and then select from a variety of formats. What I want to do is have POI open a word file, save as msxml (a truly hideous looking format) and then in a subsequent step, run an xslt transformation on the msxml file.
I saw posts for reading a word file, and then looping over all the text elements and build an xml doc from scratch that way, but I would prefer to do the xml transformation using xslt.
perl OLE allows you to do this. Is there a list of basic commands like that can be run from POI?
thanks.

Related

XSLT Reports and Internet Explorer

with IE at its EOL and allowing file access from files in Chrome is not a viable option for us, what is the future of XSLT reports?
I am fairly new to this, and have just been "thrown" into finding a solution. Everything I'm finding online is years old, it's strange that no one is talking about this since "death" of IE.
our data is in XML format, using XSL templates to display formatted reports to browser via ScriptX (smsx.cab) (with page breaks, headers, etc). The user then "prints to PDF"
I am hoping to see what other organizations are doing to ensure existing XSLT reports continue to work. Converting to something else? Making them work with other, currently supported, browsers?
thank you, all and any tips, links and comments much appreciated.
You could try executing your XSLT transformations using a local script.
Take note that these solutions only support XSLT 1.0.
MSXML
successor of msxsl.exe?
PowerShell
Applying XSL to XML with PowerShell : Exception calling "Transform"
If you want to use XSLT 2.0+
You can use Saxon and call the jar file from a batch file.
https://www.saxonica.com/

Any Decent Open Source XSLT designers for XSL-FO output [ WYSIWYG style]

We are planning to render millions of pdf's using Apache FOP by using XSL-FO as input.
Is there a decent XSLT WYSIWYG designer that allows to easily design an XSLT that will transform the XML input data to the XSL-FO required for processing by FOP?
I see a lot of commercial ones - Ecrion , Antenna House.. Any open source ones?
The only somewhat decent editor that I have found is MiniScribus Scribe but I gut stuck with it at the point of wanting to put a horizontal line and the opened odt file lost its table format in Scribe... it says that it doesnt support yet headers/footers and table borders... not so decent.
There are some converters that could be of good use, like html 2 fo and odt to fo converters but the fo code generated by them returned a lot of exceptions from the Apache's FOP processor. The odt/html file with which I was testing had only a table, two horizontal lines and some unformatted text and only one page.
These tools, the convertors and the editor as well are now in beta phase so maybe there will a decent solution, so far I have not been able to find it.

XSLT to convert an XML element containing RTF data to HTML?

OK, so here's the background:
We have a third-party piece of software that does a lot of complicated stuff to generate an XML file from a lot of tables based on a wide array of business rules. The software allows you to apply an XSL transformation by supplying an XSLT file as part of its workflow, before continuing on in the process, which is usually an upload to one or more servers, based on more business rules.
Here's the problem:
One of the elements (with more on the way) this application is processing contains RTF text, and needs to be converted into formatted HTML before being uploaded. There are no means of transforming the XML inside the application other than through an XSLT file, and once we output the file, we cannot resume the workflow. My original thought was, "Easy! someone must have written a few XSL transforms for converting RTF to formatted HTML!" Hours of searching later, I must conclude I either suck at searching or it's awfully obscure.
Disclaimers:
I know the software is pretty darned limited; I'm stuck with it.
I know there are a lot of third-party tools to do this; they are not available to me because I would need to run them externally.
I know that this is not a pretty or efficient thing to do with XSLT. Changing that is not an option for me at this point.
If I cannot find a means to do this through pure XSL transforms, I will need to output the files locally, run the extra process, and take the destination routing on through a custom process. I really don't want to do that.
Does anyone have access to an XSL transformation function/ scheme that will allow me to do this natively in the application? Perhaps a series of regular expressions I could use or something?
So it turns out that external scripts can be invoked from the XSLT. It seems I will be using another scripting language to get this to work. I'm a little bummed there was no other answer available.

Storing UTF-8 XML using Word's CustomXMLPart or any other supported way

I am writing a Word add-in which is supposed to store some own XML data per document using Word object model and its CustomXMLPart. The problem I am now facing is the lack of IStream-like functionality for reading/writing XML to/from a CustomXMLPart. It only provides BSTR interface and I am puzzled how to handle UTF-8 XMLs with BSTRs. To my understanding an UTF-8 XML file should really never have to undergo this sort of Unicode conversion. I am not sure what to expect as a result here.
Is there another way of using Word automation interfaces to store arbitrary custom information inside a DOCX file?
The "package" is an OPC document (Open Packaging Convention), which is basically a structured zip folder with a different extension (e.g. .pptx, .docx, .xps, etc.). You can get that file in stream and manipulate it any which way you like - but not artibitrarily. It will not be recognized as valid docx if you put things in the wrong places (not just xml elements, but also files in the folders inside the zip file). But if you're just talking "artibitrary" meaning CustomXMLPart, then that's okay.
This is a good kicker page to learn more about the Open XML SDK and if you're up to it, which allows for somewhat easier access to the file formats than using (.NET) System.IO.Packaging or a third-party zip library. To go deeper, grab the eBook (free) Open XML Explained.
With the Open XML SDK (again, this can all be done without the SDK) in .NET, this is what you'll want to do: How to: Insert Custom XML to an Office Open XML Package by Using the Open XML API.

C++ Logger-Should I use an ordinary xml parser?

I'm working on a logging system for my 2D engine, and I'm confused on how I should go about creating/editing the file, and how I should output that file.
I've learned that XML is more of a data carrier rather than a data displayer like HTML is. I've read that I can use XML to HTML converters. One method I've thought about is writing characters to a file in HTML.
Clarity on these matters is what I ask of you, stack overflow.
Creating an XML (or HTML) file doesn't need any special library. Straightforward string concatenation is usually good enough, you may have to encode some special characters (e.g. > into >.
But as Owen says, plain text is a log more common for log files. One reasonable compromise is comma-separated values in a text file, this gives you a little bit of structure without much overhead. For example, the Windows web server (IIS) uses this format by default, and if you have some fields that are output for each line such as timestamp or source filename and line number, this makes it easy to separate those out again.
Just about every log I've ever worked with has been pure text delimited by newlines. If you're going to depart from that, you may want to ask yourself what it is about your logging needs that you want to accomplish with markup.
If you must go the way of markup, I would suggest an XML format that contains a minimal set of markup that would be useful in your situation. You could use XML to capture structure in your log entries (timestamp, severity, and operational code, for example) that would be inconvenient to code for in HTML.
Note that you could also go hybrid and embed some XHTML tags in an XML element whose purpose is to capture displayable text, if you want.
The problem with XML or HTML files is that you cannot append at any time. You have to close the final tag (document tag) properly at the end of writing.
Therefore, it's not a popular format for logging.
For logging, I suggest using one of the existing log engines, such as Apache logger, or, John Torjo's boost log candidate. They will support log levels, runtime configuration, etc.
If you are considering writing logs in XML files, please, stop.
Log files should be simple plain text files, XML-izing it is introducing needless complexity. They are not structured data, they are meant to be read by people, not automated tools.
It all starts with XML logs, and then it goes downhill from there.