Slow Apache FOP Transformation after Saxon XSLT Transformation - xslt

In a Java application I am using Saxon HE (9.9) for the XML-FO transformation. Afterwards I am using Apache FOP (2.3) for creating the PDF file. The FOP transformation is slow compared to the execution time on the cli of both transformations subsequently (approx. 12s vs 2s for the FOP part only).
// XML->FO
Processor proc = new Processor(false);
ExtensionFunction highlightingImage = new OverlayImage();
proc.registerExtensionFunction(highlightingImage);
ExtensionFunction mergeImage = new PlanForLandRegisterMainPageImage();
proc.registerExtensionFunction(mergeImage);
ExtensionFunction rolImage = new RestrictionOnLandownershipImage();
proc.registerExtensionFunction(rolImage);
ExtensionFunction fixImage = new FixImage();
proc.registerExtensionFunction(fixImage);
ExtensionFunction decodeUrl = new URLDecoder();
proc.registerExtensionFunction(decodeUrl);
XsltCompiler comp = proc.newXsltCompiler();
XsltExecutable exp = comp.compile(new StreamSource(new File(xsltFileName)));
XdmNode source = proc.newDocumentBuilder().build(new StreamSource(new File(xmlFileName)));
Serializer outFo = proc.newSerializer(foFile);
XsltTransformer trans = exp.load();
trans.setInitialContextNode(source);
trans.setDestination(outFo);
trans.transform();
// FO->PDF
FopFactory fopFactory = FopFactory.newInstance(fopxconfFile);
OutputStream outPdf = new BufferedOutputStream(new FileOutputStream(pdfFile));
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, outPdf);
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer();
Source src = new StreamSource(foFile);
Result res = new SAXResult(fop.getDefaultHandler());
transformer.transform(src, res);
So far I'm pretty sure, that it does not depend on some file handling issues with the produced FO-file. The FO transformation is even slow if I transform a completely different FO file as the one produced with Saxon. Even the output in the console is different when not executing the XML-FO transformation:
Dec 25, 2018 1:54:47 AM org.apache.fop.apps.FOUserAgent processEvent
INFO: Rendered page #1.
Dec 25, 2018 1:54:47 AM org.apache.fop.apps.FOUserAgent processEvent
INFO: Rendered page #2.
This output will not be printed in the console when executing the XML-FO transformation before.
Is there anything in the XML-FO transformation step which has to be closed?
What is the reason for this behaviour?

I think if you use Saxon's own API to set up a Processor and your extension functions but then want to pipe the transformation XSL-FO result directly to the Apache FOP processor you can directly set up a SAXDestination:
XsltTransformer trans = exp.load();
trans.setInitialContextNode(source);
FopFactory fopFactory = FopFactory.newInstance(fopxconfFile);
OutputStream outPdf = new BufferedOutputStream(new FileOutputStream(pdfFile));
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, outPdf);
trans.setDestination(new SAXDestination(fop.getDefaultHandler()));
trans.transform();
outPdf.close();
see http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/fop/examples/embedding/java/embedding/ExampleXML2PDF.java?view=markup together with Saxon's http://saxonica.com/html/documentation/javadoc/net/sf/saxon/s9api/XsltTransformer.html#setDestination-net.sf.saxon.s9api.Destination-.

Related

Exception handling in Saxonica URIResolver

I am using saxonica EE version for xslt transformation, and throw an exception from custom URI Resolver class (given below), it is working fine for #include but same not working for #document(),
is there anyway we can stop transformation by throwing the exception while resolving document().
is it possible to apply URI resolver to document() during the compilation itself(while generating SEF).
public class CustomURIResolver implements URIResolver {
#Override
public Source resolve(String href, String base) {
String formatterOrlookUpKey = getKey(href);
if (formatterMap.containsKey(formatterOrlookUpKey)) {
return new StreamSource(new StringReader(formatterMap.get(formatterOrlookUpKey)));
} else {
throw new RuntimeException("did not find the lookup/formatter xsl " + href+" key:"+formatterOrlookUpKey);
}
}}
XSLT compilation :
Processor processor = new Processor(true);
XsltCompiler compiler = processor.newXsltCompiler();
compiler.setJustInTimeCompilation(false);
compiler.setURIResolver(new CigURIResolver(formatterMap));
XsltExecutable stylesheet = compiler.compile(new StreamSource(new StringReader(xsl)));
stylesheet.export(destination);
Transformation
Processor processor = new Processor(true);
XsltCompiler compiler = processor.newXsltCompiler();
compiler.setJustInTimeCompilation(true);
XsltExecutable stylesheet = compiler.compile(new StreamSource(new StringReader(sef)));
final StringWriter writer = new StringWriter();
Serializer out = processor.newSerializer(writer);
out.setOutputProperty(Serializer.Property.METHOD, "xml");
out.setOutputProperty(Serializer.Property.INDENT, "yes");
Xslt30Transformer trans = stylesheet.load30();
trans.setURIResolver(new CigURIResolver(formatterMap));
trans.setErrorListener(errorHandler);
trans.transform(new StreamSource(new StringReader(xml)), out);
Object obj = out.getOutputDestination();
I'm a little surprised by the observed effect, and would need a repro to investigate it. But I'm also a bit surprised that you're choosing to throw a RuntimeException, rather than a TransformerException which is what the URIResolver interface declares. If you want to explore this further please raise a support request with runnable code.
The rules for document() are a bit complex because of the XSLT 1.0 legacy of "recoverable errors": you might find that doc() behaves more predictably.
As regards compile-time resolution of doc() calls, Saxon does have an option to enable that, but it doesn't play well with SEF files: generally having external documents in a SEF file gets very messy, especially if for example you have several global variables bound to different parts of the same document.

Saxonica Generate SEF file from xslt and apply the same for transformation

I am trying to find/know the correct approach to save sef in memory and use the same for transformation
Found below two approaches to generate sef file:
1. using xsltpackage.save(File) : it works fine but here need to save content to a File which doesn't suit our requirement as we need store in memory/db.
2. XsltExecutable.export() : it generated file but if i use the same .sef file for transformation, i am getting empty content as output(result).
I use xsl:include and document in xslt and i resolved them using URI resolver.
I am using below logic to generate and transform.
Note: i am using Saxon ee (trial version).
1.XsltExecutable.export()
public static String getCompiledXslt(String xsl, Map<String, String> formatterMap) throws SaxonApiException, IOException {
try(ByteArrayOutputStream destination = new ByteArrayOutputStream()){
Processor processor = new Processor(true);
XsltCompiler compiler = processor.newXsltCompiler();
compiler.setURIResolver(new CigURIResolver(formatterMap));
XsltExecutable stylesheet = compiler.compile(new StreamSource(new StringReader(xsl)));
stylesheet.export(destination);
return destination.toString();
}catch(RuntimeException ex) {
throw ex;
}
}
use the same SEF for transformation:
Processor processor = new Processor(true);
XsltCompiler compiler = processor.newXsltCompiler();
if (formatterMap != null) {
compiler.setURIResolver(new CigURIResolver(formatterMap));
}
XsltExecutable stylesheet = compiler.compile(new StreamSource(new StringReader(standardXsl)));
Serializer out = processor.newSerializer(new File("out4.xml"));
out.setOutputProperty(Serializer.Property.METHOD, "xml");
out.setOutputProperty(Serializer.Property.INDENT, "yes");
Xslt30Transformer trans = stylesheet.load30();
if (formatterMap != null) {
trans.setURIResolver(new CigURIResolver(formatterMap));
}
trans.transform(new StreamSource(new StringReader(sourceXMl)), out);
System.out.println("Output written to out.xml");
}
when use the sef generated from above export method to transform , i am getting empty content..same code works fine with sef generated from XsltPackage.save().
UPDATE : solved the issue by setting false to property (by default it is true) compiler.setJustInTimeCompilation(false);
There's very little point (in fact, I would say there is no point) in saving a SEF file in memory. It's much better to keep and reuse the XsltExecutable or XsltPackage object rather than exporting it to a SEF structure and then reimporting it. The only reason for doing an export/import is if the exporter and importer don't share memory.
You can do it, however: I think the only thing you need to change is that you need to close the destination stream after writing to it. Saxon tries to stick to the policy "Anyone who creates a stream is responsible for closing it"

Aspose.Words convert to html (only body content)

I can create word file and convert HTML with aspose.words API. How do I get the BODY content in HTML with the API (withou html,head,body tag/ only body content). I will use this to show the output in the WYSIWYG editors (summernote) application.
Note: I am developing the application with .net Framework (C#)
Document doc = new Document(MyDir + "inputdocx.docx");
var options = new Aspose.Words.Saving.HtmlSaveOptions(SaveFormat.Html)
{
ImageSavingCallback = new HandleImageSaving(),
};
String html = doc.FirstSection.Body.ToString(options);
By default, Aspose.Words saves html in Xhtml format, so you can safely load it into XmlDocument and get bydy tag’s content. For example see the following code.
// Create a simple document for testing.
DocumentBuilder builder = new DocumentBuilder();
builder.Writeln("Hello world!!!");
// For testing purposes insert an image.
builder.InsertImage(#"https://cms.admin.containerize.com/templates/aspose/App_Themes/V3/images/aspose-logo.png");
// Additional options can be specified in the corresponding save options.
HtmlSaveOptions opt = new HtmlSaveOptions(SaveFormat.Html);
// For example, output images in the HTML as base64 string (summernote supports base64)
opt.ExportImagesAsBase64 = true;
// Save the document to MemoryStream.
using (MemoryStream ms = new MemoryStream())
{
builder.Document.Save(ms, opt);
// Move the stream position ot the beginning and load the resulting HTML into Xml document.
ms.Position = 0;
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(ms);
// Find body tag.
XmlNode body = xmlDoc.SelectSingleNode("//body");
// Get inner xml of the body.
Console.WriteLine(body.InnerXml);
}
Hope this helps.
Disclosure: I work at Aspose.Words team.

Saxon XSLT .Net Transformation: What to give in BaseURI when xml and xsl both are passed as strings

This is the code I have for Saxon Transformation of XSLT files which accepts xml and xslt and returns a transformed string. I can have either xsl 1.0 or 2.0 get processed through this function.
DocumentBuilder requires a BaseURI, even if I don't have any file format. I have provided "c:\\" as the BaseURI, inspite I have nothing to do with this directory.
Is there any better way to achieve this thing or write this function?
public static string SaxonTransform(string xmlContent, string xsltContent)
{
// Create a Processor instance.
Processor processor = new Processor();
// Load the source document into a DocumentBuilder
DocumentBuilder builder = processor.NewDocumentBuilder();
Uri sUri = new Uri("c:\\");
// Now set the baseUri for the builder we created.
builder.BaseUri = sUri;
// Instantiating the Build method of the DocumentBuilder class will then
// provide the proper XdmNode type for processing.
XdmNode input = builder.Build(new StringReader(xmlContent));
// Create a transformer for the stylesheet.
XsltTransformer transformer = processor.NewXsltCompiler().Compile(new StringReader(xsltContent)).Load();
// Set the root node of the source document to be the initial context node.
transformer.InitialContextNode = input;
StringWriter results = new StringWriter();
// Create a serializer.
Serializer serializer = new Serializer();
serializer.SetOutputWriter(results);
transformer.Run(serializer);
return results.ToString();
}
If you think that the base URI will never be used (because you never do anything that depends on the base URI) then the best strategy is to set a base URI that will be instantly recognizable if your assumption turns out to be wrong, for example "file:///dummy/base/uri".
Choose something that is a legal URI (C:\ is not).

Internet Explorer 9 and XSLT

I have some javascript code that, based on the browser you're using, applies an XSL transformation to some XML received. This works in all browsers except IE9. Although there's a provision in the logic for IE (to use tranformNode instead of new XSLTProcessor()) it would seem that IE9 does not define transformNode anymore.
I've been searching for some time to see if this is a problem for others without any luck. Which is puzzling and makes me think I'm doing something terribly wrong.
Here's the code that works with IE7/8 (from jstree - although slightly modified for clarity):
xm = document.createElement('xml');
xs = document.createElement('xml');
xm.innerHTML = xml;
xs.innerHTML = xsl;
xm.transformNode(xs.XMLDocument)
All I could find regarding IE9 and XSLT is that "it has been changed to be more standards compliant". I think it was referring to the way that the transformations were done, not so much the API.
From the author of jsTree (which uses XSLT transformations to render XML source data to the tree):
if(window.ActiveXObject) {
var xslt = new ActiveXObject("Msxml2.XSLTemplate");
var xmlDoc = new ActiveXObject("Msxml2.DOMDocument");
var xslDoc = new ActiveXObject("Msxml2.FreeThreadedDOMDocument");
xmlDoc.loadXML(xml);
xslDoc.loadXML(xsl);
xslt.stylesheet = xslDoc;
var xslProc = xslt.createProcessor();
xslProc.input = xmlDoc;
xslProc.transform();
callback.call(null, xslProc.output);
return true;
}
http://code.google.com/p/jstree/issues/detail?id=907&q=IE9&colspec=ID%20Type%20Status%20Priority%20Owner%20Summary