How to insert a blank line in XSL-FO properly? - xslt

I am trying to figure out how to do that properly. I tried to use processing instructions in the code but it seems they are somehow ignored at all.
In the text:
end of a paragraph.<?linebreak?></p>
As for templating, I tried:
<xsl:template match="processing-instruction('linebreak')">
<fo:block>
<xsl:apply-templates/>
<fo:leader/>
</fo:block>
</xsl:template>
Or simply for testing purposes:
<xsl:template match="processing-instruction('linebreak')">
<fo:block>XXXX</fo:block>
</xsl:template>
No matters what I do, the template is never used.
I use it inside an eXist-db app (3.0RC1) but I think this is not associated with eXist-db. There is FOP 1.1. I am not sure about the Saxon version.

Traditionally, you don't insert a line break at the end of a paragraph. Instead, you specify e.g. space-below="12pt" on the fo:block that contains the paragraph.
A line break is always inserted, even if you don't want it (e.g. when the paragraph is placed at the bottom of a page and the line break would wrap to the next page. The space-below can be made conditional, so this space will be collapsed if it appears at the bottom of a page. This results in a better-looking layout.

No matters what I do, the template is never used.
Concerning this part of the problem, a possible explanation is that the template matching the parent element (<p> in your examples) silently ignores processing instructions when applying templates.
For example, this quasi-identity stylesheet ignores processing instructions when elements are processed, so their matching template is never executed:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="* | #*">
<xsl:copy>
<!-- this only processes elements, attributes and text nodes! -->
<xsl:apply-templates select="* | #* | text()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="processing-instruction('linebreak')">
XXXXX
</xsl:template>
</xsl:stylesheet>
In order for the processing instructions to be taken into account, the template matching elements must explicitly apply templates to them too:
<xsl:template match="* | #*">
<xsl:copy>
<xsl:apply-templates select="* | #* | text() | processing-instruction()"/>
</xsl:copy>
</xsl:template>
Note that using <xsl:apply-templates/> would not work too, as it does not select processing instructions nor attributes, just elements and text nodes.

Related

XSLT style - pattern matching multiple templates

This is a question about xslt 1.0 (but i've included the general xslt tag as it may apply more widely).
lets say we want to take this xml
<root>
<vegetable>
<carrot colour="orange"/>
<leek colour="green"/>
</vegetable>
</root>
and transform it to cook the vegetables if they are root vegetables so this..
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="carrot">
<xsl:copy>
<xsl:attribute name="cooked">true</xsl:attribute>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="leek">
</xsl:template>
</xsl:stylesheet>
so the xslt recursively processes the data, and when it finds multiple matching templates e.g. leek and carrot, it takes the last one, effectively overriding.
Sometimes accepted answers in this site have this style,
e.g. XSLT copy-of but change values
other answers specifically about multiple matching templates
e.g.
XSLT Multiple templates with same match
state
Having two templates of the same priority that match the same node is an error according to the XSLT specification
It is an error if [the algorithm in section 5.5] leaves more than one
matching template rule. An XSLT processor may signal the error; if it
does not signal the error, it must recover by choosing, from amongst
the matching template rules that are left, the one that occurs last in
the stylesheet.
so....we can avoid this by either using priority or by matching to explicitly excluding the the overlap, something like this.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#* | node()[not(self::carrot) or not(self::leek)]">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="carrot">
<xsl:copy>
<xsl:attribute name="cooked">true</xsl:attribute>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="leek">
</xsl:template>
</xsl:stylesheet>
I get the feeling that lots of devs actually simply use the default fallback behaviour and let the processor use the last match, this is similar in style to pattern matching in most functional languages where the 1st match is used.
I also personally am not a fan of priority, it feels a bit like magic numbers, where i have to scan and remember the priority of the pattern matches to work out whats going on.
The approach to explicitly exclude overlaps, seems sensible, but in practice requires complex logic and creates coupling between templates, if i extend/add a new match, i potentially have to amend constrain another.
Is the above a correct summary?
Is there an accepted style (even if it contradicts the spec)?
I think you may be missing the fact that there is no error in the given example, therefore the rule of applying the template that occurs last in the stylesheet is not invoked. You can verify this by switching the order of the templates and observing that the result remains unchanged.
There is no error because the identity transform has a priority of -0.5 while the specific templates have a priority of 0.
Read the entire specification for conflict resolution:
https://www.w3.org/TR/1999/REC-xslt-19991116/#conflict

Can't get the "s" flag to work in regex in Saxon 9.5

I have an XML envelope/payload structure like this:
<RootEnvelopeTag>
<EnvelopeTag />
<EnvelopeTag />
<EnvelopeTagContainingPayload>
<WantedPayloadTag>Some text and nested tags.</WantedPayloadTag><UnwantedPayloadTag>Lots of text and nested tags.</UnwantedPayloadTag>
</EnvelopeTagContainingPayload>
</RootEnvelopeTag>
To extract the payload, by removing all envelope elements, I use the following XSLT:
<xsl:transform version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output method="text" encoding="utf-8"/>
<xsl:template match="/">
<xsl:apply-templates select="*/EnvelopeTagContainingPayload"/>
</xsl:template>
<xsl:template match="EnvelopeTagContainingPayload">
<xsl:value-of select="."/>
</xsl:template>
</xsl:transform>
The result is a new text file that, once parsed as XML, allows me to work only with the payload XML.
This works fine in both Saxon HE 9.5, and AltovaXML 2013. However, I am now in the need to also remove part of the payload, specifically, one element, including the tags and all of its content (the <UnwantedPayloadTag>ALL TEXT IN BETWEEN</UnwantedPayloadTag>).
Since, in the original XML file, the payload is just a string, I use replace() with a regular expression that matches the unwanted element and the empty string as replacement string. I include the "s" flag, to get the "." in the regex to match newlines present within the unwanted element. So, the template for the container envelope element changes to:
<xsl:template match="EnvelopeTagContainingPayload">
<xsl:variable name="removeUnwanted" as="xs:string" select="replace(., '<UnwantedPayloadTag.*UnwantedPayloadTag>', '', 's')" />
<xsl:value-of select="$removeUnwanted"/>
</xsl:template>
In AltovaXML, this works seamlessly. The result is exactly as expected. But in Saxon, it wreaks havoc. No output is generated; instead, I get in the command line an endless repetition of the following error message that clutters the whole DOS command line window:
at net.sf.saxon.regex.Operation$OpStar.exec(Operation.java:235)
at net.sf.saxon.regex.REMatcher.matchNodes(REMatcher.java:413)
The problem appears only when I use the "s" flag. But if I drop it, I won't get the match. I tried an alternative that doesn't require the flag and does the same:
<xsl:variable name="removeUnwanted" as="xs:string" select="replace(., '<UnwantedPayloadTag[\s\S]*UnwantedPayloadTag>', '')" />
But I get the same error on Saxon. And again, Altova gets it right. I'm unsure of whether the problem is on my code, since it works fine in Altova. But I would really like to get this to work in Saxon, as well. So, what's wrong?
As Saxon 9.6 is now available and even the Home Edition HE supports XPath 3.0 functions like parse-xml-fragment the right approach to your problem is now doing
<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output method="xml" encoding="utf-8"/>
<xsl:template match="/">
<xsl:apply-templates select="*/EnvelopeTagContainingPayload"/>
</xsl:template>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* , node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="EnvelopeTagContainingPayload">
<xsl:apply-templates select="parse-xml-fragment(.)"/>
</xsl:template>
<xsl:template match="UnwantedPayloadTag"/>
</xsl:transform>
as that way you simply parse the markup as XML and then use templates to filter out any elements you don't want.
You're getting a stack overflow in the Saxon regular expression engine because there's too much backtracking. We've got a fix for that in the future 9.6 release, but in the meantime you need to be careful about regular expressions that do too much backtracking.
Really, your approach is wrong. Regular expressions should not be used to parse XML. Your expression is wrong, because it can match things that it shouldn't match, e.g. something in a comment that looks like an end tag. You can't get it right by tweaking the regex, because XML has a recursive grammar and regular expressions can't handle recursive grammars. Saxon provides parse-xml() for this purpose.

xml:space='preserve' doesn't seem to get on with xsl:apply-templates select="node()"

Doing some work with xsl - first time I've done anything serious, and I've hit something which I can't explain. Easiest way to show it is with the identity transform:
This works:
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
This doesn't (says "Unable to apply transformation on current source"):
<xsl:template match="#*|node()" xml:space='preserve'>
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
This does:
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:apply-templates select="node()" xml:space='preserve'/>
</xsl:copy>
</xsl:template>
OK, I can see what's happening. But I don't understand why. Why does xml:space not want to play nicely with attributes? Just curious.
BTW, this is using the xsl translator that's built into Notepad++. Perhaps I shouldn't trust it?
What are you trying to accomplish? xml:space="preserve" tells XML-consuming applications that you want to preserve whitespace-only text nodes that are descendants of the element that xml:space is an attribute of. In this example, you have xml:space as an attribute of <xsl:apply-templates>, but <xsl:apply-templates> has no whitespace-only text node descendants, so xml:space has no possible effect.
I think you wanted to preserve whitespace-only text nodes from the input XML document (not from the XSLT stylesheet). In that case, you need xml:space to be in the input XML document, not in the XSLT stylesheet. The stylesheet can have xsl:preserve-space-elements="*", but that's already the default, unless you have xsl:strip-space-elements set.
Yes, I would be inclined to wonder whether the XSLT processor used by Notepad++ (libxml) is doing something illegit. As a good diagnostic, try a respected processor like Saxon and see if you get any errors.
Either that, or just remove xml:space from your stylesheet, since it won't do you any good even if the processor doesn't throw an error.
Suggestion:
Just use
<xsl:output method="html" indent="yes"/>
as the first child of <xsl:stylesheet>.
The indent="yes" will prevent all the output elements from being crammed together on one line, so you can read the results.
Whitespace is not preserved for attributes according to specification - it is highlighted in this posting. Preserving attribute whitespace in XSLT

Having trouble selecting properties with XSLT

I need to select Property1, and SubProperty2 and strip out any other properties. I need to make this future proof so that any new properties added to the xml won't break validation. iow's new fields have to be stripped by default.
<Root>
<Property1/>
<Property2/>
<Thing>
<SubProperty1/>
<SubProperty2/>
</Thing>
<VariousProperties/>
</Root>
so in my xslt I did this:
<xsl:template match="Property1">
<Property1>
<xsl:apply-templates/>
</Property1>
</xsl:template>
<xsl:template match="/Thing">
<SubProperty1>
<xsl:apply-templates select="SubProperty1" />
</SubProperty1>
</xsl:template>
<xsl:template match="*" />
The last line should strip anything I haven't defined to be selected.
This works to select my property1 but it always selects an empty node for SubProperty. The match on * seems to strip out the deeper object before my match on them can work.
I removed the match on * and it select my SubProperty with a value. So, how can I select the sub properties and still strip everything away that I am not using.
Thanks for any advise.
There are two problems:
<xsl:template match="*"/>
This ignores any element for which there isn't an overriding, more specific template.
Because there isn't a specific template for the top element Root it is ignored together with all of its subtree -- which is the complete document -- no output at all is produced.
The second problem is here:
<xsl:template match="/Thing">
This template matches the top element named Thing.
However in the provided document the top element is named Root. Therefore the above template doesn't match any node from the provided XML document and is never selected for execution. As the code inside its body is supposed to generate SubProperty1, no such output is generated.
Solution:
Change
<xsl:template match="*"/>
to:
<xsl:template match="text()"/>
And change
<xsl:template match="/Thing">
to
<xsl:template match="Thing">
The whole transformation becomes:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="Property1">
<Property1>
<xsl:apply-templates/>
</Property1>
</xsl:template>
<xsl:template match="Thing">
<SubProperty1>
<xsl:apply-templates select="SubProperty1" />
</SubProperty1>
</xsl:template>
<xsl:template match="text()" />
</xsl:stylesheet>
And when applied on the following XML document (as the provided is severely malformed it had to be fixed):
<Root>
<Property1/>
<Property2/>
<Thing>
<SubProperty1/>
<SubProperty2/>
</Thing>
<VariousProperties/>
</Root>
the result now is what is wanted:
<Property1/>
<SubProperty1/>

XSLT template overriding

I have a small question regarding XSLT template overriding.
For this segment of my XML:
<record>
<medication>
<medicine>
<name>penicillin G</name>
<strength>500 mg</strength>
</medicine>
</medication>
</record>
In my XSLT sheet, I have two templates in the following order:
<xsl:template match="medication">
<xsl:copy-of select="." />
</xsl:template>
<xsl:template match="medicine/name">
<text>!unauthorized information!</text>
</xsl:template>
What I want to do is to copy everything under the medication element to the output other than the "name" element (or any other element that I explicitly define). The final xml will be shown to the user in RAW XML form. In other words, the result I want is:
<record>
<medication>
<medicine>
<text>! unauthorized information!</text>
<strength>500 mg</strength>
</medicine>
</medication>
</record>
Whereas I am getting the same XML as input, i.e. without the element replaced by text. Any ideas why the second template match is not overriding the name element in the first one? Thanks in advance
--
Ali
Template order does not matter. The only case it possibly becomes considered (and this is processor-dependent) is when you have an un-resolvable conflict, i.e. an error condition. In that case, it's legal for the XSLT processor to recover from the error by picking the one that comes last. However, you should never write code that depends on this behavior.
In your case, template priority isn't even the issue. You have two different template rules, one matching <medication> elements and one matching <name> elements. These will never collide, so it's not a question of template priority or overriding. The issue is that your code never actually applies templates to the <name> element. When you say <xsl:copy-of select="."/> on <medication>, you're saying: "perform a deep copy of <medication>". The only way any of the template rules will fire for descendant nodes is if you explicitly apply templates (using <xsl:apply-templates/>.
The solution I have for you is basically the same as alamar's, except that it uses a separate processing "mode", which isolates the rules from all other rules in your stylesheet. The generic match="#* | node()" template causes template rules to be recursively applied to children (and attributes), which gives you the opportunity to override the behavior for certain nodes.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- ...placeholder for the rest of your code... -->
<xsl:template match="/record">
<record>
<xsl:apply-templates/>
</record>
</xsl:template>
<!-- end of placeholder -->
<xsl:template match="medication">
<!-- Instead of copy-of, whose behavior is to always perform
a deep copy and cannot be customized, define your own
processing mode. Rules with this mode name are isolated
from the rest of your code. -->
<xsl:apply-templates mode="copy-medication" select="."/>
</xsl:template>
<!-- By default, copy all nodes and their descendants -->
<xsl:template mode="copy-medication" match="#* | node()">
<xsl:copy>
<xsl:apply-templates mode="copy-medication" select="#* | node()"/>
</xsl:copy>
</xsl:template>
<!-- But replace <name> -->
<xsl:template mode="copy-medication" match="medicine/name">
<text>!unauthorized information!</text>
</xsl:template>
</xsl:stylesheet>
The rule for "medicine/name" overrides the rule for "#* | node()", because the format of the pattern (which contains a "/") makes its default priority (0.5) higher than the default priority of "node()" (-1.0).
A complete but concise description of how template priority works can be found in "How XSLT Works" on my website.
Finally, I noticed you mentioned you want to display "RAW XML" to the user. Does that mean you want to display, for example, the XML, with all the start and end tags, in a browser? In that case, you'd need to escape all markup (e.g., "<" for "<"). Check out the XML-to-string utility on my website. Let me know if you need an example of how to use it.
Add
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
to your <xsl:template match="medicine/name">
And remove <xsl:template match="medication"> altogether!
<?xml version="1.0" encoding="windows-1251"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="medicine/name">
<text>!unauthorized information!</text>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>