I’m trying to flatten an element’s text nodes and nested inlined elements
<e>something <inline>rather</inline> else</e>
into
<text>something </text>
<text-inline>rather</text-inline>
<text> else</text>
Using e/text() would return both text nodes but how do I flatten all nodes in order for arbitrarily inlined elements (even nested)?
I am not sure "flatten" is the right term for this. It seems all you want to do is change some text nodes into elements containing the same text. This can be done by a template matching these text nodes:
<xsl:template match="e/text()">
<text>
<xsl:copy/>
</text>
</xsl:template>
Demo: https://xsltfiddle.liberty-development.net/ncdD7n4
Of course, if you also want to rename inline to text-inline, you will need another template for that:
<xsl:template match="inline">
<text-inline>
<xsl:apply-templates />
</text-inline>
</xsl:template>
Related
I have several hundred XML files which i need to make a slight change to. I'm aware that i really should be using XSLT to make batch changes to XML structure, but i think some quick and dirty Regex will do what i need much faster than me working out the XSLT. At least i thought that before spending hours trying to get the Regex right!!
Take the below example, what i have is various lists <seqlist> which contain <items> elements for each item in the list. Each <item> element contains a <para> element which has various ID attribute values. I want to remove those <para> tags completely so that the <item> contains the actual text.
So from: <seqlist><item><para id="1.1">Some text here.</para></item></seqlist>
To: <seqlist><item>Some text here.</item></seqlist>
This is fairly strightforward in itself i can simply do:
Regex: <item><para id="([^\"]*)">
Replace: <item>
Then remove the redundant closing tags by doing a simple find replace
Find: </para></item>
Replace: </item>.
However, as can be seen from the example below, some <item> elements in the list, contain another <seqlist> nested within them, which contains further nested <item> ad <para> tags. This means the above find replace to remove the closing </para> tag will result in the closing </para> in the very last line in the example below being replaced too.
Basically what i need to say is: find </para></item> and replace with </item> UNLESS there is a opening <para> element to the left of it.
The very last line of the example below explains it better. If i do the above Find & Replace the last </para> will be removed and it will not parse.
Any ideas how to achive this please?
<seqlist>
<item><para id="p7.1"><emphasis>JRK Type 1</emphasis>: (NSP XX-XX-XXX-XXXX)
outputs:
<seqlist>
<item><para id="p7.1.1">12 V or 15 V,0-5A</para></item>
<item><para id="p7.1.2">12 V or 15 V,0-5A</para></item>
</seqlist></para>
<para>Both at 120 W maximum output power.</para><para>The outputs are isolated, permitting parallel or serial connection to provide power as required.</para></item>
<item><para id="p7.2"><emphasis>JRK Type 2:</emphasis> (NSN 6130-99-788-6945) outputs:</para>
<seqlist>
<item><para id="p7.2.1">5 V, 0 - 30 A</para></item>
<item><para id="p7.2.2">12 V, 0 - 0.5 A</para></item>
</seqlist><para>Both at 120 W maximum output power.</para>
<para>The 12 V outputs are measured with respect to a common 0 V line but these are isolated from the 5 V output.</para></item>
</seqlist>
Here is the trivial XSLT way:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="seqlist/item/para">
<xsl:apply-templates/>
</xsl:template>
</xsl:transform>
Online at http://xsltransform.net/3NSSEw6.
If only those para elements with an id attribute are to be removed then use
<xsl:template match="seqlist/item/para[#id]">
<xsl:apply-templates/>
</xsl:template>
for that template instead, http://xsltransform.net/3NSSEw6/1.
I've the below XML.
<?xml version="1.0" encoding="UTF-8"?>
<para align="center">
<content-style font-style="bold">A.1 This is the first text</content-style> (This is second text)
</para>
Below are my 2 Questions.
here i've declared a regex to match the content-style, But when i run this the second one is caught where as it should be div class="para", but in the output i get <div class="para align-center">. please let me know where am i going wrong.
Is there a way i can apply-templates with in the match. when i tried it throws me an error. I want it like below.
if (para)
xsl:apply-templates select child::node()[not(self::text)]
else
xsl:apply-templates
Working Example
Thanks
If you want to use apply-templates inside the analyze-string then you need to store the context node outside of analyze-string in a variable <xsl:variable name="context-node" select="."/>, then you can use <xsl:apply-templates select="$context-node/node()"/> for instance to process the child nodes.
Whether you need that approach I am not sure, I wonder whether you can not simply use the matches functions in a pattern e.g. <xsl:template match="para[content-style[matches(., '(\w+)\.(\w+)')]]">...</xsl:template>.
Background
Converting a document from OpenOffice to DocBook format.
Problem
Parts of the document include the following:
<ul><li><ul><li><ul><li><p>...</p></li></ul></li></ul></li></ul>
While other parts of the document include:
<ul><li><p>...</p></li></ul>
I tried to match just the inner-most ul tag using:
<xsl:template match="xhtml:ul[not(child::xhtml:li/child::xhtml:ul)]">
...
</xsl:template>
But this does not match. The following expression:
<xsl:template match="xhtml:ul">
Will create, as expected:
<itemizedlist>
<itemizedlist>
<itemizedlist>
...
</itemizedlist>
</itemizedlist>
</itemizedlist>
Desired Output
The desired output format, regardless of ul nesting, is:
<itemizedlist>
...
</itemizedlist>
Question
What's the correct syntax for matching the innermost child ul node?
Ideas
There are a few ways to resolve this:
Search/replace the original document (there are only ~20 instances).
Use xsl:test within the li node to see if the child node is ul.
What XPath expression would work? For example:
<xsl:template match="xhtml:ul[not(grandchild::xhtml:ul)]">
Thank you!
<xsl:template match="xhtml:ul[not(descendant::xhtml:ul)]">
<itemizedlist><xsl:apply-templates /></itemizedlist>
</xsl:template>
I have a structured XML with this structure:
<root>
<item/>
<item/>
<something/>
<item/>
</root>
If I use something like this:
<xsl:for-each select="/root/item">
it will pick all the item elements inside the list. I want to interrupt the loop after the second item, because between the 2nd and the 3rd there is a something element.
How can I get this?
You can't actually break out of a xsl:for-each loop. You need to construct your loop so as to select only the elements you want in the first place.
In this case, you want to select all item elements which don't have a preceding sibling that isn't also an item element.
<xsl:for-each select="/root/item[not(preceding-sibling::*[not(self::item)])]">
<xsl:value-of select="position()" />
</xsl:for-each>
When this is used, it should only select the first two item elements.
In XSLT there isn't any possibility for a "break" out of an <xsl:for-each> or out of <xsl:apply-templates>, except using <xsl:message terminate="yes"/> which you probably don't want. This is due to the fact that XSLT is a functional language and as in any functional language there isn't any concept of "order of execution" -- for example the code can be executing in parallel on all the nodes that are selected.
The solution is to specify in the select attribute an expression selecting exactly the wanted nodes.
Use:
<xsl:for-each select="/*/*[not(self::item)][1]/preceding-sibling::*">
<!-- Processing here -->
</xsl:for-each>
This selects for processing all preceding elements siblings of the first child element of the top element that isn't item -- that means the starting group of adjacent item elements that are the first children of the top element.
Look at the following two examples:
<foo>some text <bar/> and maybe some more</foo>
and
<foo>some text <bar/> and a last <bar/></foo>
Mixed text nodes and bar elements within the foo element. Now I am in foo, and want to find out if the last child is a bar. The first example should prove false, as there are text after the bar, but the second example should be true.
How can I accomplish this with XSLT?
Just select the last node of the <foo> element and then use self axis to resolve the node type.
/foo/node()[position()=last()]/self::bar
This XPath expression returns an empty set (which equates to boolean false) if the last node is not an element. If you want to specifically get value true or false, wrap this expression in the XPath function boolean(). Use self::* instead of self::bar to match any element as the last node.
Input XML document:
<root>
<foo>some text <bar/> and maybe some more</foo>
<foo>some text <bar/> and a last <bar/></foo>
</root>
XSLT document example:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="foo">
<xsl:choose>
<xsl:when test="node()[position()=last()]/self::bar">
<xsl:text>bar element at the end
</xsl:text>
</xsl:when>
<xsl:otherwise>
<xsl:text>text at the end
</xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Output of the stylesheet:
text at the end
bar element at the end
Now I am in foo, and want to find
out if the last child is a bar
Use:
node()[last()][self::bar]
The boolean value of any non-empty node-set is true() and it is false() for otherwise. You can use the above expression directly (unmodified) as the value of the test attribute of any <xsl:if> or <xsl:when>.
Better, use:
foo/node()[last()][self::bar]
as the match attribute of an <xsl:template> -- thus you write in pure "push" style.
Update: This answer addresses the requirement stated in the original question title, "finding out if last child node is a text node." But the question body suggests a different requirement, and it seems that the latter requirement was the one intended by the OP.
The previous two answers explicitly test whether the last child is a bar element, rather than directly testing whether it is a text node. This is correct if foo contains only "mixed text nodes and bar elements" and never has zero children.
But you may want to test directly whether the last child is a text node:
For readability of stylesheet logic
In case the element contains other children besides elements and text: e.g. comments or processing instructions
In case the element has no children
Maybe you know the latter two will never occur in your case (but from your question I would guess that #3 could). Or maybe you think so but aren't sure, or maybe you hadn't thought about it. In either case, it's safer to test directly for what you actually want to know:
test="node()[last()]/self::text()"
Thus, building on #Dimitre's example code and input, the following XML input:
<root>
<foo>some text <bar/> and maybe some more</foo>
<foo>some text <bar/> and a pi: <?foopi param=yes?></foo>
<foo>some text <bar/> and a comment: <!-- baz --></foo>
<foo>some text and an element: <bar /></foo>
<foo noChildren="true" />
</root>
With this XSLT template:
<xsl:template match="foo">
<xsl:choose>
<xsl:when test="node()[last()]/self::text()">
<xsl:text>text at the end;
</xsl:text>
</xsl:when>
<xsl:when test="node()[last()]/self::*">
<xsl:text>element at the end;
</xsl:text>
</xsl:when>
<xsl:otherwise>
<xsl:text>neither text nor element child at the end;
</xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
yields:
text at the end;
neither text nor element child at the end;
neither text nor element child at the end;
element at the end;
neither text nor element child at the end;