Our users are asking for a simple way to convert DocBook XML to OOXML (DocBook to Word DOCX) and back. The editing in house is done in Oxygen XML Editor but sometimes they have to send files for editing to other people and Word is still the de facto standard.
I imagine this will be a major undertaking so any help is appreciated.
EDIT:
Found docbook2wordml XSL stylesheets. This could be useful.
Try Oxgarage: http://www.tei-c.org/oxgarage/ it converts between DocBook and OOXML (and other formats) via TEI.
You can use it as a web service, or you can get the XSLT 2.0 scripts here: https://github.com/TEIC/Stylesheets/
Edit: updated links!
OxGarage http://oxgarage.oucs.ox.ac.uk:8080/ege-webclient/ seems down.
Alternatively, you can get up running your own OxGarage service using the OxGarage docker image, which should be the preferable way.
Also you could try building an OxGarage docker image yourself using this oxgarage-docker GutHub repo. Once it's up running, then you can use the OxGarage web client in your browser:
http://<your host name e.g., localhost>:8080/ege-webclient
Related
with IE at its EOL and allowing file access from files in Chrome is not a viable option for us, what is the future of XSLT reports?
I am fairly new to this, and have just been "thrown" into finding a solution. Everything I'm finding online is years old, it's strange that no one is talking about this since "death" of IE.
our data is in XML format, using XSL templates to display formatted reports to browser via ScriptX (smsx.cab) (with page breaks, headers, etc). The user then "prints to PDF"
I am hoping to see what other organizations are doing to ensure existing XSLT reports continue to work. Converting to something else? Making them work with other, currently supported, browsers?
thank you, all and any tips, links and comments much appreciated.
You could try executing your XSLT transformations using a local script.
Take note that these solutions only support XSLT 1.0.
MSXML
successor of msxsl.exe?
PowerShell
Applying XSL to XML with PowerShell : Exception calling "Transform"
If you want to use XSLT 2.0+
You can use Saxon and call the jar file from a batch file.
https://www.saxonica.com/
Can anybody please let me know whether it is possible to export microstrategy grid data in text format to a FTP server (required access will be provided). If not directly, then can we use some kind of java coding/web services to achieve this. I don't want the process but want to understand whether this can be achieved or not?
Thanks in Advance!
You can retrieve report results (and build a new report from scratch at that) via the SDK and from there you can process the data to your liking, i.e. transform & upload to a ftp-server.
Possibly easier would be to create a file-subscription and store the file to a specific directory where you automatically pick it up and deliver it to your ftp.
There might be other solutions as well, but Yes is the answer to the "Yes/No" part of your question.
I thought i had a simple question, but somehow i cant find a source for the answer....which document formats can be indexed by the Lucene version that is packaged with Railo 4.0?
Somehow .doc and .pdf seem to go well, but docx and rtf just don't seem to get indexed....Is there a list available somewhere? And for all formats that arent supported, what would be the best way to get that info indexed aswell by cfindex?
<cfindex
collection = "#collection#"
action = "update"
type = "file"
key ="#ABSfilepath#"
title="#ABSfilepath#"
>
thanks!
Question also posted to Railo mailing list: web link.
Railo 4 uses Lucene 2.4.1 - how do you tell? Same way you tell the version for all third-party software that Railo uses: locate the JAR file (in the lib/ext directory), open that archive (using 7-zip or equivalent), and look at META-INF/MANIFEST.MF where you find content like this:
Specification-Title: Lucene Search Engine: core
Specification-Version: 2.4.1
Specification-Vendor: The Apache Software Foundation
Implementation-Title: org.apache.lucene
Implementation-Version: 2.4.1 750176 - 2009-03-04 21:56:52
Implementation-Vendor: The Apache Software Foundation
This seems to be a pretty old version and doesn't look like it has any docs on the Apache Lucene website. (It might be possible to upgrade Lucene by replacing the relevant JARs, but this might also cause dependency issues; do at own risk.)
Since the Lucene website doesn't help, a search for "lucene 2.4.1 indexable documents" brings back a pertinent question about v2.3.2 which asks:
Does Lucene java supports parsing of extensions *.docx, *.pptx, *.mpp i.e.
Microsoft Windows 2007 documents?
With the response:
Lucene doesn't actually support any of the document types. What happens
is that some program is used to parse the files into an indexable stream
and that stream is indexed. That used to be POI in the old days.
Ok, so assuming that is still accurate, Lucene doesn't control the filetypes, Apache POI does.
Checking the JARs tells us Railo 4.0 uses Apache POI v3.8 and looking at the POI changelog reveals that .docx support arrived in v3.5
So, your .docx files should be supported along with the other MS Office formats. If it's definitely not being indexed, you probably need to identify if it's a POI issue or a Lucene issue or a Railo issue - creating a simple reproducable test case with both .doc and .docx documents is probably a good first step.
Beyond that, you'll need someone familiar with Lucene/POI to advise - there may or not be log files that will contain details of possible indexing/retrieval errors, or ways to interact with Lucene directly (not via Railo/cfindex) that can help identify where the issue lies.
As part of my application, my client has requested that I include an automated e-mailing system. As part of this system, I generate HTML code and use automation to send it via. Outlook.
However, they also require a PDF copy of the HTML document to be sent as an attachment. My initial attempts involved using libHaru, which proved difficult to use efficiently, as I was required to create the PDF document from scratch, which required computation of the position of each of the lines in a table, and positioning of all the text, etc.
I was wondering if there would be a way to programmatically convert HTML code (or an HTML file if need be) into a PDF document either by using Win32/MFC itself or an external library.
Thanks in advance!
EDIT: Just to clarify, I am looking for solutions which minimize external dependencies.
You should evaluate this utility wkhtmltopdf:
http://code.google.com/p/wkhtmltopdf/
You can call it from the command line without the need to run a setup.
I use it generating my output documents as html then cal a ShellExecute(...) to convert it to PDF. It's great!
Inside uses webkit + qt. So compability with modern HTML is OK.
Hope it helps.
I'd take a look at PDF Creator, which can be used as a COM object (that acts pretty much like a printer). I haven't used it to print HTML, so I'm not sure, but my guess is that you'll probably end up having to instantiate a web browser control to render the HTML, and then feed it from there to the PDF control.
Some possible answers are in this thread:
C++ Library to Convert HTML to PDF?
Not sure if they will satisfy your particular requirements, but these might at least get you started.
Edit:
Some other possible options here.
Not MFC but you can try QtWebKit. It can render and export HTML to PDF, PNG, JPEG
I want to create a application, where you drop a xml doc on it, and an xslt transformation occurs using saxon as the transform engine. The result is a text file.
I am doing this on a mac. Does anyone know where I could find a beginner tutorail to approach this??
Thanks,
Ian
At sourceforge, take a look at the saxon resource zip file, it contains example applications for saxon.