I want to sort all the <text> elements by the value of the attribute top.
However an element should only be sorted if its previous sibling has a value of top that exceeds its own by 2 or more units.
For example, the following elements
<text top="100">text 1</text>
<text top="99">text 2</text>
<text top="100">text 3</text>
<text top="99">text 4</text>
<text top="35">text 5</text>
<text top="40">text 6</text>
should be transformed to:
<text top="35">text 5</text>
<text top="40">text 6</text>
<text top="100">text 1</text>
<text top="99">text 2</text>
<text top="100">text 3</text>
<text top="99">text 4</text>
So that the group:
<text top="100">text 1</text>
<text top="99">text 2</text>
<text top="100">text 3</text>
<text top="99">text 4</text>
remains as is after sorting.
I only use XSLT from time to time and only know the usual sorting approach:
<xsl:for-each select="text">
<xsl:sort select="#top" />
<xsl:copy>
<xsl:copy-of select="./node()|./#*" />
</xsl:copy>
</xsl:for-each>
But the result I want to achieve would require some kind of bubble sort.
Not sure whether it's doable with pure XSLT.
I have an XSLT 2.0 processor.
I wonder whether in XSLT 2/3 it can just be done with an adequate group-ending-with pattern:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="3.0">
<xsl:param name="limit" as="xs:integer" select="1"/>
<xsl:output indent="yes"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="root">
<xsl:for-each-group select="text" group-ending-with="text[abs(xs:decimal(following-sibling::text[1]/#top) - xs:decimal(#top)) > $limit]">
<xsl:sort select="min(current-group()/#top/xs:decimal(.))"/>
<xsl:sequence select="current-group()"/>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
Based on the much simplified XQuery code
for tumbling window $group in root/text
start when true()
end $e next $ne when abs(xs:decimal($ne/#top) - xs:decimal($e/#top)) > 1
order by min($group/#top/xs:decimal(.))
return
$group
As I undertand the requeriments are grouping and then sorting. Do note that it is assumed that groups which their elements have less than 2 units of increment are sorted among the others groups taking only the minimum into account (meaning that groups don't overlap).
This stylesheet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:template match="*[text]">
<xsl:for-each-group
select="text"
group-adjacent="boolean(
(preceding-sibling::text[1]
|following-sibling::text[1])
[abs(#top - current()/#top) < 2])">
<xsl:sort select="min(#top)"/>
<xsl:choose>
<xsl:when test="current-grouping-key()">
<xsl:copy-of select="current-group()"/>
</xsl:when>
<xsl:otherwise>
<xsl:perform-sort select="current-group()">
<xsl:sort select="#top" data-type="number"/>
</xsl:perform-sort>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
Output:
<text top="35">text 5</text>
<text top="40">text 6</text>
<text top="100">text 1</text>
<text top="99">text 2</text>
<text top="100">text 3</text>
<text top="99">text 4</text>
Test it in here
EDIT: not assuming only increasing sequence with abs() function.
Related
This XSLT transformation works, but I am repeating the same code multiple times, which makes it very redundant!
How can I optimize this?
<xsl:for-each select="RVWT">
<xsl:variable name="rvwt" select="tokenize(., '\|')"/>
<TextTypeCode>08</TextTypeCode>
<Text textformat="05">
<xsl:value-of select="$rvwt[1]"/>
</Text>
<TextSourceTitle>
<xsl:value-of select="normalize-space(substring($rvwt[2], 3))"/>
</TextSourceTitle>
</xsl:for-each>
<xsl:if test="not(RVWT)">
<xsl:for-each select="RVW">
<xsl:variable name="rvwt" select="tokenize(., '\|')"/>
<TextTypeCode>08</TextTypeCode>
<Text textformat="05">
<xsl:value-of select="$rvwt[1]"/>
</Text>
<TextSourceTitle>
<xsl:value-of select="normalize-space(substring($rvwt[2], 3))"/>
</TextSourceTitle>
</xsl:for-each>
</xsl:if>
Thanks!
I. I would use the best that XSLT 2.0 can offer: creating a function:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="my:my" exclude-result-prefixes="my">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/*">
<xsl:sequence select=
"my:Extract(RVW[not(../RVWT)] | RVWT)"/>
</xsl:template>
<xsl:function name="my:Extract">
<xsl:param name="pItems" as="item()+"/>
<xsl:for-each select="$pItems">
<xsl:variable name="vItemTokens" select="tokenize(., '\|')"/>
<TextTypeCode>08</TextTypeCode>
<Text textformat="05">
<xsl:value-of select="$vItemTokens[1]"/>
</Text>
<TextSourceTitle>
<xsl:value-of select="normalize-space(substring($vItemTokens[2], 3))"/>
</TextSourceTitle>
</xsl:for-each>
</xsl:function>
</xsl:stylesheet>
when this transformation is applied on the following XML document:
<t>
<RVWT>a|bbbTail|c</RVWT>
<RVWT>d|eeeTail|f</RVWT>
<RVWT>g|hhhTail|i</RVWT>
<RVW>p|qqqTail|r</RVW>
</t>
the wanted, correct result is produced:
<TextTypeCode>08</TextTypeCode>
<Text textformat="05">a</Text>
<TextSourceTitle>bTail</TextSourceTitle>
<TextTypeCode>08</TextTypeCode>
<Text textformat="05">d</Text>
<TextSourceTitle>eTail</TextSourceTitle>
<TextTypeCode>08</TextTypeCode>
<Text textformat="05">g</Text>
<TextSourceTitle>hTail</TextSourceTitle>
when applied on this document:
<t>
<RVWTX>a|bbbTail|c</RVWTX>
<RVWTX>d|eeeTail|f</RVWTX>
<RVWTX>g|hhhTail|i</RVWTX>
<RVW>p|qqqTail|r</RVW>
</t>
again the wanted, correct result is produced:
<TextTypeCode>08</TextTypeCode>
<Text textformat="05">p</Text>
<TextSourceTitle>qTail</TextSourceTitle>
II. Do note:
With this approach we have the added benefit that the function accepts any sequence of items, not only elements.
For example, one could call the function like this:
my:Extract(('a|bbbTail|c', 'd|eeeTail|f'))
and still get the wanted result:
<TextTypeCode>08</TextTypeCode>
<Text textformat="05">a</Text>
<TextSourceTitle>bTail</TextSourceTitle>
<TextTypeCode>08</TextTypeCode>
<Text textformat="05">d</Text>
<TextSourceTitle>eTail</TextSourceTitle>
You can write a template
<xsl:template match="RVWT | RVW">
<xsl:variable name="rvwt" select="tokenize(., '\|')"/>
<TextTypeCode>08</TextTypeCode>
<Text textformat="05">
<xsl:value-of select="$rvwt[1]"/>
</Text>
<TextSourceTitle>
<xsl:value-of select="normalize-space(substring($rvwt[2], 3))"/>
</TextSourceTitle>
</xsl:template>
and then in the parent you process <xsl:apply-templates select="if (RVWT) then RVWT else RVW"/>.
I am guessing you could do:
<xsl:for-each select="RVWT | RVW[not(../RVWT)]">
<xsl:variable name="rvwt" select="tokenize(., '\|')"/>
<TextTypeCode>08</TextTypeCode>
<Text textformat="05">
<xsl:value-of select="$rvwt[1]"/>
</Text>
<TextSourceTitle>
<xsl:value-of select="normalize-space(substring($rvwt[2], 3))"/>
</TextSourceTitle>
</xsl:for-each>
I'm trying to handle a xml converted from a pdf to another xml file in some format. First I want to move / group some text / node together based on the geometry of the text but failed to do so. The following is my input & what I wanted:
input xml:
<Pages>
<Page>
<PAGENUMBER>1</PAGENUMBER>
<Box llx="59.40" lly="560.64" urx="68.58" ury="571.68">
<Text>5.</Text>
</Box>
<Box llx="81.84" lly="560.64" urx="194.39" ury="571.68">
<Text>Equipment list</Text>
</Box>
<Box llx="257.40" lly="560.64" urx="265.36" ury="571.68">
<Text>C</Text>
</Box>
<Box llx="315.84" lly="535.32" urx="325.63" ury="546.36">
<Text>a)</Text>
</Box>
</Page>
<Page>
same structure as above...
</Page>
</Pages>
Output xml:
<Pages>
<Page>
<PAGENUMBER>1</PAGENUMBER>
<Box llx="59.40" lly="560.64" urx="68.58" ury="571.68">
<Text>5. Equipment list C</Text>
</Box>
<Box llx="315.84" lly="535.32" urx="325.63" ury="546.36">
<Text>a)</Text>
</Box>
</Page>
<Page>
same structure as above...
</Page>
</Pages>
What i have:
<xsl:template match="#*|node()" name = "identity">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Box">
<xsl:choose>
<xsl:when test="#ury = following-sibling::Box/#ury">
<xsl:call-template name="identity"/>
<xsl:apply-templates select ="#*"/>
<xsl:copy-of select="following-sibling::Box/Text"/>
</xsl:when>
<xsl:otherwise>
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
1.It doesn't copy the wanted nodes 2. i don't know how to exclude the following nodes. I hope someone can help me on this. Many thanks in advance.
I tried the following to exclude the duplicates but it doesn't copy what i want anyways:
<xsl:template match="Box[#ury != preceding-sibling::Box/#ury]/Text">
<xsl:copy><xsl:apply-templates/></xsl:copy>
</xsl:template>
This is a case of muenchian grouping in which you need to group the nodes based on certain common criteria and process them to provide an output.
Based on the version of XSLT being used, the solution differs for XSLT 1.0 and XSLT 2.0
XSLT 1.0
Version 1.0 uses a <xsl:key> to group the elements based on common criteria. In this case, the grouping is being done based on the value of attribute #ury so we define a key
<xsl:key name="groupingKey" match="Box" use="#ury" />
Using this key, the templates are grouped together for processing.
<xsl:template match="Box[generate-id() = generate-id(key('groupingKey', #ury)[1])]">
Finally within the grouped elements, a loop is run over the <Text> elements to concatenate its values.
<Text>
<xsl:variable name="fullText">
<xsl:for-each select="key('groupingKey', #ury)/Text">
<xsl:value-of select="concat(., ' ')" />
</xsl:for-each>
</xsl:variable>
<xsl:value-of select="normalize-space($fullText)" />
</Text>
Below is the complete XSLT 1.0
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" />
<xsl:strip-space elements="*" />
<xsl:key name="groupingKey" match="Box" use="#ury" />
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="node() | #*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Box[generate-id() = generate-id(key('groupingKey', #ury)[1])]">
<xsl:copy>
<xsl:apply-templates select="#*" />
<Text>
<xsl:variable name="fullText">
<xsl:for-each select="key('groupingKey', #ury)/Text">
<xsl:value-of select="concat(., ' ')" />
</xsl:for-each>
</xsl:variable>
<xsl:value-of select="normalize-space($fullText)" />
</Text>
</xsl:copy>
</xsl:template>
<xsl:template match="Box" />
</xsl:stylesheet>
XSLT 2.0
Version 2.0 is advanced and provides a simpler approach as compared to XSLT 1.0. The <xsl:for-each-group> and group-by feature can be used to group the elements together.
<xsl:for-each-group select="Box" group-by="#ury">
Below is the complete XSLT 2.0
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" indent="yes" />
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="node() | #*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Page">
<xsl:copy>
<xsl:apply-templates select="PAGENUMBER" />
<xsl:for-each-group select="Box" group-by="#ury">
<xsl:copy>
<xsl:apply-templates select="#*" />
<Text>
<xsl:variable name="fullText">
<xsl:for-each select="current-group()/Text">
<xsl:value-of select="concat(., ' ')" />
</xsl:for-each>
</xsl:variable>
<xsl:value-of select="normalize-space($fullText)" />
</Text>
</xsl:copy>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Both the XSLT provide the required output
<Pages>
<Page>
<PAGENUMBER>1</PAGENUMBER>
<Box llx="59.40" lly="560.64" urx="68.58" ury="571.68">
<Text>5. Equipment list C</Text>
</Box>
<Box llx="315.84" lly="535.32" urx="325.63" ury="546.36">
<Text>a)</Text>
</Box>
</Page>
</Pages>
I have a scenario where I need to convert the input XML to a CSV file. The output should have values for every attribute with their respective XPATH.
For example: If my input is
<School>
<Class>
<Student name="" class="" rollno="" />
<Teacher name="" qualification="" Employeeno="" />
</Class>
</School>
The expected output would be:
School/Class/Student/name, School/Class/Student/class, School/Class/Student/rollno,
School/Class/Teacher/name, School/Class/Teacher/qualification, School/Class/Teacher/Employeeno
An example does not always embody a rule. Assuming you want a row for each element that has any attributes, no matter where in the document it is, and a column for each attribute of an element, try:
Edit:
This is an improved version, corrected to work properly with nested elements.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="*">
<xsl:param name="path" />
<xsl:variable name="newpath" select="concat($path, '/', name())" />
<xsl:apply-templates select="#*">
<xsl:with-param name="path" select="$newpath"/>
</xsl:apply-templates>
<xsl:if test="#*">
<xsl:text>
</xsl:text>
</xsl:if>
<xsl:apply-templates select="*">
<xsl:with-param name="path" select="$newpath"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="#*">
<xsl:param name="path" />
<xsl:value-of select="substring(concat($path, '/', name()), 2)"/>
<xsl:if test="position()!=last()">
<xsl:text>, </xsl:text>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
When applied to the following test input:
<Root>
<Parent parent="1" parent2="1b">
<Son son="11" son2="11b"/>
<Daughter daughter="12" daughter2="12b">
<Grandson grandson="121" grandson2="121b"/>
<Granddaughter granddaughter="122" granddaughter2="122b"/>
</Daughter>
<Sibling/>
</Parent>
</Root>
the result is:
Root/Parent/parent, Root/Parent/parent2
Root/Parent/Son/son, Root/Parent/Son/son2
Root/Parent/Daughter/daughter, Root/Parent/Daughter/daughter2
Root/Parent/Daughter/Grandson/grandson, Root/Parent/Daughter/Grandson/grandson2
Root/Parent/Daughter/Granddaughter/granddaughter, Root/Parent/Daughter/Granddaughter/granddaughter2
Note that the number of columns in each row can vary - this is often unacceptable in a CSV document.
I am trying to wrap all the text node in a <text> element, but facing challenge when encounter an inline elements (i, b, emphasis), that should be in same <text> node (in other words, it should be considered as text)... Please see input and desired output below:
(Note: I have to do this for specific inline elements only, hence kept it in param (it could be anything), for rest of the elements standard <text> rule should be applied. (Please see my xslt for details)
Input XML:
<?xml version="1.0" encoding="utf-8"?>
<root>
<para>XML Translation is a format that's used to <emphasis>exchange <i>localisation</i></emphasis>data</para>
<para>The process can now be reformulated with more detail as follows:<ul>
<li>Text extraction <note>Separation of translatable text from layout data</note></li>
<li>Pre-translation</li>
<li>Translation</li>
<li>Reverse conversion</li>
<li>Translation memory improvement</li>
</ul>above mentioned steps should <b>executed</b> sequentially</para>
</root>
OutPut should be:
<?xml version="1.0" encoding="utf-8"?>
<root>
<para>
<text xid="d0t3">XML Translation is a format that's used to <g ctype="emphasis">exchange <g ctype="i">localisation</g></g>data </text>
</para>
<para>
<text xid="d0t10">The process can now be reformulated with more detail as follows:</text>
<ul>
<li><text xid="d0t13">Text extraction <g ctype="note">Separation of translatable text from layout data</g></text></li>
<li><text xid="d0t17">Pre-translation</text></li>
<li><text xid="d0t19">Translation</text></li>
<li><text xid="d0t21">Reverse conversion</text></li>
<li><text xid="d0t23">Translation memory improvement</text></li>
</ul>
<text xid="d0t24">above mentioned steps should <g ctype="b">executed</g> sequentially</text>
</para>
</root>
I am trying something like this, but not able to achieve correct result:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:strip-space elements="*"/>
<xsl:param name="inlineElement" select="('emphasis', 'i', 'note', 'b')"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<text>
<xsl:attribute name="xid">
<xsl:value-of select="generate-id()"/>
</xsl:attribute>
<xsl:value-of select="."/>
<xsl:if test="following-sibling::node()[local-name()=$inlineElement]">
<g>
<xsl:apply-templates select="following-sibling::node()[local-name()=$inlineElement]/text()"/>
</g>
</xsl:if>
</text>
</xsl:template>
</xsl:stylesheet>
I would use for-each-group group-adjacent:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes"/>
<xsl:param name="inlineElement" select="('emphasis', 'i', 'note', 'b')"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(local-name() = $inlineElement)]">
<xsl:copy>
<xsl:for-each-group select="node()" group-adjacent="boolean(self::text() | self::*[local-name() = $inlineElement])">
<xsl:choose>
<xsl:when test="current-grouping-key()">
<text xid="{generate-id(current-group()[self::text()][1])}">
<xsl:apply-templates select="current-group()"/>
</text>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
<xsl:template match="*[local-name() = $inlineElement]">
<g ctype="{local-name()}">
<xsl:apply-templates/>
</g>
</xsl:template>
</xsl:stylesheet>
That way, with Saxon 9.5, I get
<?xml version="1.0" encoding="UTF-8"?>
<root>
<para>
<text xid="d1t3">XML Translation is a format that's used to <g ctype="emphasis">exchange <g ctype="i">localisation</g>
</g>data</text>
</para>
<para>
<text xid="d1t10">The process can now be reformulated with more detail as follows:</text>
<ul>
<li>
<text xid="d1t13">Text extraction <g ctype="note">Separation of translatable text from layout data</g>
</text>
</li>
<li>
<text xid="d1t17">Pre-translation</text>
</li>
<li>
<text xid="d1t19">Translation</text>
</li>
<li>
<text xid="d1t21">Reverse conversion</text>
</li>
<li>
<text xid="d1t23">Translation memory improvement</text>
</li>
</ul>
<text xid="d1t24">above mentioned steps should <g ctype="b">executed</g> sequentially</text>
</para>
</root>
I have some XSLT that replaces linebreaks with <Break/> tags and it works fine as long as there isn't multiple consecutive linebreaks. I think it's the indent="yes" that's causing problems.
Can it be disabled for some nodes?
Basically nodes with mixed content (text and elements) can not contain any linebreaks.
The input xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<Account xmlns="http://example.com/account">
<Owner>
<ID>012345789</ID>
<Name>Peter Johnson</Name>
</Owner>
<Notes>
<NoteID>012345789</NoteID>
<Text>This is the description:
Line 1
Line 2
Line 3
Line 4, after double linebreak
Line 5</Text>
</Notes>
</Account>
The XSL:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://example.com/account" version="1.0">
<xsl:output method="xml" version="1.0" encoding="ISO-8859-1" indent="yes"/>
<xsl:template name="replace_sab">
<!-- with string s, replace substring a by string b -->
<!-- s, a and b are parameters determined upon calling -->
<xsl:param name="s" />
<xsl:param name="a" />
<xsl:param name="b" />
<xsl:choose>
<xsl:when test="contains($s,$a)">
<xsl:value-of select="substring-before($s,$a)" />
<xsl:copy-of select="$b" />
<xsl:call-template name="replace_sab">
<xsl:with-param name="s" select="substring-after($s,$a)" />
<xsl:with-param name="a" select="$a" />
<xsl:with-param name="b" select="$b" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$s" />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()[not(normalize-space())]"/>
<xsl:template match="text()[boolean(normalize-space())]">
<xsl:call-template name="replace_sab">
<xsl:with-param name="s" select="." />
<xsl:with-param name="a" select="'
'" />
<xsl:with-param name="b"><Break/></xsl:with-param>
</xsl:call-template>
</xsl:template>
</xsl:stylesheet>
The output that I get:
<?xml version="1.0" encoding="ISO-8859-1"?>
<Account xmlns="http://example.com/account">
<Owner>
<ID>012345789</ID>
<Name>Peter Johnson</Name>
</Owner>
<Notes>
<NoteID>012345789</NoteID>
<Text>This is the description:<Break/>Line 1<Break/>Line 2<Break/>Line 3<Break/>
<Break/>Line 4, after double linebreak<Break/>Line 5</Text>
</Notes>
</Account>
The output I would like:
<?xml version="1.0" encoding="ISO-8859-1"?>
<Account xmlns="http://example.com/account">
<Owner>
<ID>012345789</ID>
<Name>Peter Johnson</Name>
</Owner>
<Notes>
<NoteID>012345789</NoteID>
<Text>This is the description:<Break/>Line 1<Break/>Line 2<Break/>Line 3<Break/><Break/>Line 4, after double linebreak<Break/>Line 5</Text>
</Notes>
</Account>
I am using "TIBCO XSLT 1.0" XSLT engine in a Tibco BusinessWorks process.
There's no standard way of doing this.
If you were using Saxon you could use the saxon:suppress-indentation output parameter, which becomes a standard option in XSLT 3.0.
Perhaps you could find a way of inserting the Saxon serializer into your processing pipeline even if you stick with the Tibco XSLT engine.