xslt deduplicate values by substring - xslt

I have following categories:
<categories>
<category>anotherparent</category>
<category>parent</category>
<category>parent/child1</category>
<category>parent/child1/subchild1</category>
<category>parent/child2</category>
<category>parent/child3/</category>
<category>parent/child3/subchild3</category>
</categories>
Problem here is that the category path is "duplicated". Basically I'd like to remove all parent category paths and only include the most concrete level.
So the result should be something like this:
<categories>
<category>anotherparent</category>
<category>parent/child1/subchild1</category>
<category>parent/child2</category>
<category>parent/child3/subchild3</category>
</categories>
I can think about some java extension, but I can't find proper method/function how to do this in xslt and I'm pretty sure it should be easy.
It could be xslt 2 or 3.

Perhaps
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
expand-text="yes"
exclude-result-prefixes="#all"
xmlns:mf="http://example.com/mf"
version="3.0">
<xsl:function name="mf:group" as="element(category)*">
<xsl:param name="cats"/>
<xsl:param name="level"/>
<xsl:choose>
<xsl:when test="$cats?2[$level]">
<xsl:for-each-group select="$cats[?2[$level]]" group-by="?2[$level]">
<xsl:sequence select="mf:group(current-group(), $level + 1)"/>
</xsl:for-each-group>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="$cats?1"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output indent="yes"/>
<xsl:template match="categories">
<xsl:copy>
<xsl:sequence select="mf:group(category ! [., tokenize(., '/')], 1)"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
helps; assumes, like the comment asks, that a trailing / in <category>parent/child3/</category> is a typo and would be <category>parent/child3</category>. If parent/child3/ can occur but should be treated as parent/child3 then use tokenize(., '/')[normalize-space()] instead of tokenize(., '/').
It might be cleaner to use a sequence of maps with two items in the function instead of a sequence of size 2 arrays:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
expand-text="yes"
exclude-result-prefixes="#all"
xmlns:mf="http://example.com/mf"
version="3.0">
<xsl:function name="mf:group" as="element(category)*">
<xsl:param name="cats" as="map(xs:string, item()*)*"/>
<xsl:param name="level" as="xs:integer"/>
<xsl:choose>
<xsl:when test="$cats?tokens[$level]">
<xsl:for-each-group select="$cats[?tokens[$level]]" group-by="?tokens[$level]">
<xsl:sequence select="mf:group(current-group(), $level + 1)"/>
</xsl:for-each-group>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="$cats?cat"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output indent="yes"/>
<xsl:template match="categories">
<xsl:copy>
<xsl:sequence select="mf:group(category ! map { 'cat' : ., 'tokens' : tokenize(., '/') }, 1)"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Again, it might be necessary to use tokenize(., '/')[normalize-space()] instead of tokenize(., '/') if trailing or leading or in between slashes can occur but should be ignored.

If you input XML is always in the format you posted, this works:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="#all"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="category[starts-with(following-sibling::category[1],.)]"/>
</xsl:stylesheet>
See it working here: https://xsltfiddle.liberty-development.net/gVrvcxY

Related

Process all files in a directory and create a file according to file-name

I have a working XSLT-transformation that I need to apply to all files in the specified directory:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:tei ="http://www.tei-c.org/ns/1.0"
exclude-result-prefixes="xs"
version="3.0">
<xsl:strip-space elements="*"/>
<xsl:output method="text" indent="yes"/>
<xsl:output omit-xml-declaration="yes"/>
<xsl:variable name="files" select="collection('C:\Users\KW\Desktop\Interim_56_zerlegt')"/>
<xsl:template match="/">
<xsl:result-document href="Interim_56_zerlegt/test.txt" method="text">
Kuerzel; AT/NT; Stelle
<xsl:apply-templates select="//note"></xsl:apply-templates>
</xsl:result-document>
</xsl:template>
<xsl:template match="//note">
<xsl:choose>
<xsl:when test="child::*[1][self::ref[#type='biblical']]">
<xsl:for-each select="child::*[#type='biblical']/#cRef">
<xsl:value-of select="."/>
<xsl:text>;</xsl:text>
</xsl:for-each>
<xsl:text></xsl:text>
<xsl:text>test;</xsl:text>
<xsl:value-of select="."/>
<xsl:text>
</xsl:text>
</xsl:when>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
I have a folder of 56 files that I want to run this transformation on.
So I managed to create a file for one file (<xsl:result-document href='Interim_56_zerlegt') and now I need to do this for "every file" in there (technically only XML but there's nothing else in there) Is there a way to do this?
Use an initial template
<xsl:param name="files" select="uri-collection('file:///C:/Users/KW/Desktop/Interim_56_zerlegt/?select=*.xml')"/>
<xsl:template name="xsl:initial-template">
<xsl:apply-templates select="$files ! doc(.)"/>
</xsl:template>
and start Saxon (?) with e.g -it.
You will also need to adjust the href="Interim_56_zerlegt/test.txt" to e.g. href="Interim_56_zerlegt/result-{position()}.txt" or perhaps href="{document-uri() => replace('\.xml$', '.txt')}".

XSLT mapping to remove double quotes which has PIPE delimited symbol inside

Experts, i need to write XSLT 1.0 code to eliminate the Pipe delimited symbol inside double quotes and also need to remove those double quotes..
Input:
<?xml version="1.0" encoding="utf-8"?>
<ns:MT_FILE>
<LN>
<LD>EXTRACT|"28|53"|1308026.7500|1176</LD>
</LN>
<LN>
<LD>DETAIL|1176|"LOS LE|OS PARRILLA"|Y|R||||<LD>
</LN>
</ns:MT_FILE>
** Desired Output:**
<?xml version="1.0" encoding="utf-8"?>
<ns:MT_FILE>
<LN>
<LD>EXTRACT|2853|1308026.7500|1176</LD>
</LN>
<LN>
<LD>DETAIL|1176|LOS LE OS PARRILLA|Y|R||||<LD>
</LN>
</ns:MT_FILE>
** XSLT I used is below:**
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*/text()">
<xsl:value-of select="translate(., '\"', '')"/>
</xsl:template>
</xsl:stylesheet>
This XSLT removing all the double quotes from my input field, please assist here..
If it can be assumed that quotes will always come in pairs, you could do:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:call-template name="process">
<xsl:with-param name="text" select="."/>
</xsl:call-template>
</xsl:template>
<xsl:template name="process">
<xsl:param name="text"/>
<xsl:choose>
<xsl:when test="contains($text, '"')">
<xsl:value-of select="substring-before($text, '"')"/>
<xsl:value-of select="translate(substring-before(substring-after($text, '"'), '"'), '|', '')"/>
<xsl:call-template name="process">
<xsl:with-param name="text" select="substring-after(substring-after($text, '"'), '"')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
As you tagged as EXSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"
xmlns:regexp="http://exslt.org/regular-expressions"
exclude-result-prefixes="regexp">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="LD/text()">
<xsl:value-of select="regexp:replace(., '(")([^|]+)\|([^"]+)(")', 'g', '$2$3')"/>
</xsl:template>
</xsl:stylesheet>

XSLT regular expression to remove sequences text

I have an XML, something like this:
<?xml version="1.0" encoding="UTF-8"?>
<earth>
<computer>
<parts>;;remove;;This should stay;;remove too;;This stay;;yeah also remove;;this stay </parts>
</computer>
</earth>
I want to create an XSLT 2.0 transform to remove all text which starts and ends with ;;
<?xml version="1.0" encoding="utf-8"?>
<earth>
<computer>
<parts>This should stay This stay this stay </parts>
</computer>
</earth>
Try to do something like this but no luck:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="fn">
<xsl:output encoding="utf-8" method="xml" indent="yes" />
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="parts">
<xsl:element name="parts" >
<xsl:value-of select="replace(., ';;.*;;','')" />
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Wow, what a dumb way to markup text. You have XML at your disposal, why not use it? And even if marking this way, why not use different symbols for opening and closing the marked parts?
Anyway, I believe this returns the expected result:
XSLT 2.0
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="parts">
<xsl:copy>
<xsl:value-of select="replace(., ';;.+?;;', '')" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Another approach would be tokenize on ";;" as separator, then remove all even-numbered tokens:
<xsl:template match="parts">
<parts>
<xsl:value-of select="tokenize(.,';;')[position() mod 2 = 1]"
separator=""/>
</parts>
</xsl:template>
XSLT 1.0
For this kind of thing I'd use recursion. Just using string replace you can get what is before and after a certain character (or set of characters). All you need to do is continually loop over the string until there are no more occurrences of the replace character, like follows:
<xsl:template name="string-remove-between">
<xsl:param name="text" />
<xsl:param name="remove" />
<xsl:choose>
<xsl:when test="contains($text, $remove)">
<xsl:value-of select="substring-before($text,$remove)" />
<xsl:call-template name="string-remove-between">
<xsl:with-param name="text" select="substring-after(substring-after($text,$remove), $remove)" />
<xsl:with-param name="remove" select="$remove" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
Then you'd just call the template with your text and the section you want to remove:
<xsl:call-template name="string-remove-between">
<xsl:with-param name="text" select="parts"/>
<xsl:with-param name="remove">;;</xsl:with-param>
</xsl:call-template>
Note that there are two substring-after calls, this makes sure we get the second instance of the replace characters ';;' so we aren't pulling in the text between.

Using substring in xsl

Following on from an earlier question, and this is more about xsl syntax. I want to split part of a URL variable into a new variable in xsl.
This code works when the variable is sitting part way along a URL. EG:
http://www.mysite.com/test.aspx?aVar=something&bVar=somethingMore&cVar=yetMoreStill
<xsl:variable name="testVar" select="substring-after($url, 'bVar=')"/>
<xsl:value-of select="substring-before($testVar, '&')" />
The problem is the variable can sometime sit at the end of the URL (I have no control over this) EG:
http://www.mysite.com/test.aspx?aVar=something&bVar=somethingMore
So the above code fails. Is there away I can allow for both occurrences? The end game is I'm just trying to get the value of bVar no matter where it sits within the URL. Thanks.
How about the following workaround?
<xsl:variable name="testVar" select="substring-after($url, 'bVar=')"/>
<xsl:value-of select="substring-before(concat($testVar, '&'), '&')" />
Try this. This is XSLT 1.0:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:template match="/">
<xsl:call-template name="urlResolver">
<xsl:with-param name="input" select="'http://www.mysite.com/test.aspx?aVar=something&bVar=somethingMore'" />
</xsl:call-template>
<xsl:call-template name="urlResolver">
<xsl:with-param name="input" select="'http://www.mysite.com/test.aspx?aVar=something&bVar=somethingMore&cVar=yetMoreStill'" />
</xsl:call-template>
</xsl:template>
<xsl:template name="urlResolver">
<xsl:param name="input" />
<xsl:variable name="testVar" select="substring-after($input, 'bVar=')"/>
<xsl:choose>
<xsl:when test="contains($testVar, '&')"><xsl:value-of select="substring-before($testVar, '&')" /></xsl:when>
<xsl:otherwise><xsl:value-of select="$testVar" /></xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Try to make use of tokenize (available in XSLT 2.0) like the following:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" omit-xml-declaration="yes" method="xml" version="1.0"/>
<xsl:template match="/">
<xsl:variable name="test"><![CDATA[http://www.mysite.com/test.aspx?aVar=something&bVar=somethingMore&cVar=yetMoreStill]]></xsl:variable>
<xsl:variable name="splitURL" select="tokenize($test,'&')"/>
<xsl:variable name="bvar" select="$splitURL[starts-with(.,'bVar')]"/>
<out><xsl:value-of select="substring-after($bvar, 'bVar=')"/></out>
</xsl:template>
</xsl:stylesheet>
The currently accepted answer is generally wrong.
Try it with this URL:
http://www.mysite.com/test.aspx?subVar=something&bVar=somethingMore
and you get the wrong result: something
This question was already answered... In case you read the answer you would just reuse it and get your QString from the produced result:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ext="http://exslt.org/common" exclude-result-prefixes="ext">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="pUrl" select=
"'http://www.mysite.com/test.aspx?subVar=something&bVar=somethingMore'"/>
<xsl:template match="/">
<xsl:variable name="vrtfQStrings">
<xsl:call-template name="GetQueryStringParams"/>
</xsl:variable>
bVar = "<xsl:value-of select="ext:node-set($vrtfQStrings)/bVar"/>"
</xsl:template>
<xsl:template name="GetQueryStringParams">
<xsl:param name="pUrl" select="$pUrl"/>
<xsl:variable name="vQueryPart" select=
"substring-before(substring-after(concat($pUrl,'?'),
'?'),
'?')"/>
<xsl:variable name="vHeadVar" select=
"substring-before(concat($vQueryPart,'&'), '&')"/>
<xsl:element name="{substring-before($vHeadVar, '=')}">
<xsl:value-of select="substring-after($vHeadVar, '=')"/>
</xsl:element>
<xsl:variable name="vRest" select="substring-after($vQueryPart, '&')"/>
<xsl:if test="string-length($vRest) > 0">
<xsl:call-template name="GetQueryStringParams">
<xsl:with-param name="pUrl" select=
"concat('?', substring(substring-after($vQueryPart, $vHeadVar), 2))"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
When applied on any XML document (not used), this transformation produces the wanted, correct result:
bVar = "somethingMore"

XSLT find word near another word

How can i find, with XSLT, the word before and after another know word in a text node?
I. In XSLT 2.x / XPath 2.x one can use the functions tokenize() and index-of() to produce the desired results with one-liner XPath expressions:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:param name="pWord" select="'three'"/>
<xsl:template match="text()">
<xsl:sequence select=
"tokenize(., ',\s*')
[index-of(tokenize(current(), ',\s*'), $pWord) -1]"/>
<xsl:sequence select=
"tokenize(., ',\s*')
[index-of(tokenize(current(), ',\s*'), $pWord) +1]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML document:
<t>One, two, three, four</t>
the wanted, correct result is produced:
two four
II. XSLT 1.0 solution
It is possible to solve the same task in XSLT 1.0 using the strSplit-to-Words template of FXSL.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ext="http://exslt.org/common"
>
<xsl:import href="strSplit-to-Words.xsl"/>
<xsl:output method="text"/>
<xsl:param name="pWord" select="'three'"/>
<xsl:template match="/">
<xsl:variable name="vrtfwordNodes">
<xsl:call-template name="str-split-to-words">
<xsl:with-param name="pStr" select="/"/>
<xsl:with-param name="pDelimiters"
select="',
'"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="vwordNodes"
select="ext:node-set($vrtfwordNodes)/*"/>
<xsl:variable name="vserchWordPos" select=
"count($vwordNodes
[. = $pWord]/preceding-sibling::*
) +1"/>
<xsl:value-of select=
"concat($vwordNodes[$vserchWordPos -1],
' ',
$vwordNodes[$vserchWordPos +1]
)
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the same XML document:
<t>One, two, three, four</t>
the wanted, correct result is produced:
two four