Remove all white space from text - xslt

I have an TEI-xml file that consists of the more or less typical elements.
I want to output the normalized version of a text without any white spaces etc.
my XSL for doing so consists of these lines:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:tei ="http://www.tei-c.org/ns/1.0"
exclude-result-prefixes="xs tei"
version="2.0">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="tei:note" />
<xsl:template match="tei:orig" >
<xsl:text> </xsl:text>
</xsl:template>
<xsl:template match="tei:text/tei:body//text()">
<xsl:value-of select="." />
</xsl:template>
<xsl:template match="text()">
</xsl:template>
So it takes the text of the body and creates a copy of it, while it applies an empty template for <note> and <orig> - elements. The output looks just fine, but I have the TEI-immanent indentions and further white spaces in my text.
I thought the lien <xsl:strip-space elements="*"/> would take care of this, apparently not.
How can I remove all the white spaces in a TEI-XML?
(google and stackoverflow solutions didn't work so far :-( )
all the best,
Kevin

Related

XSLT 1.0, copy all child nodes except some

I am trying to copy all child nodes to a specific node, except a few. Haven't been able to get this to work? Any pointers of what I am doing wrong?
Using this XML:
<ns0:Envelope xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/">
<ns0:Header>
<wsse:Sec xmlns:wsse="http://docs.x.org/wsse/">
<saml:Ass xmlns:saml="http://docs.x.org/saml/">
<ds:Sign xmlns:ds="http://docs.x.org/ds/">
<ds:SignVal>SignatureValue</ds:SignVal>
</ds:Sign>
<saml:subj>SubjectValue</saml:subj>
</saml:Ass>
</wsse:Sec>
<To>http://localhost:8080/Test/</To>
<Action>SendTest</Action>
</ns0:Header>
<ns0:Body>...</ns0:Body>
</ns0:Envelope>
The wanted result is to just get the Sec tag and all children:
<wsse:Sec xmlns:wsse="http://docs.x.org/wsse/">
<saml:Ass xmlns:saml="http://docs.x.org/saml/">
<ds:Sign xmlns:ds="http://docs.x.org/ds/">
<ds:SignVal>SignatureValue</ds:SignVal>
</ds:Sign>
<saml:subj>SubjectValue</saml:subj>
</saml:Ass>
</wsse:Sec>
I have tried numerous XSL including this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="utf-8" indent="yes"/>
<xsl:template match="Header">
<xsl:copy-of select="*"/>
</xsl:template>
<!-- Exclude these -->
<xsl:template match="To" />
<xsl:template match="Action" />
</xsl:stylesheet>
The result is I get values but no tags...
You have not accounted for namespaces in your XSLT. In your XML, Header is in namespace http://schemas.xmlsoap.org/soap/envelope/, but your XSLT is trying to match a Header in no namespace.
You need to declare the namespaces in your XSLT, and use them in the template matches
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:wsse="http://docs.x.org/wsse/">
<xsl:output method="xml" encoding="utf-8" indent="yes"/>
<xsl:template match="ns0:Header">
<xsl:copy-of select="wsse:Sec"/>
</xsl:template>
<xsl:template match="ns0:Body" />
</xsl:stylesheet>
Note this XSLT doesn't need templates matching "To" and "Action" because of the explicit copy of wsse:Sec using this approach. However, you do need to template to ensure any test within ns0:Body isn't picked up.
Another approach is to use the identity template, and then you would have the templates to exclude To and Action (and Body)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:wsse="http://docs.x.org/wsse/">
<xsl:output method="xml" encoding="utf-8" indent="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ns0:Envelope|ns0:Header">
<xsl:apply-templates />
</xsl:template>
<!-- Exclude these -->
<xsl:template match="ns0:Body|To|Action" />
</xsl:stylesheet>
Note there is a template matching ns0:Envelope and ns0:Header as although you don't want these elements themselves, you do need to process the child nodes.
You would need to use XSLT 2 or 3 with
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:wsse="http://docs.x.org/wsse/"
exclude-result-prefixes="#all"
version="3.0">
<xsl:template match="/">
<xsl:copy-of select="//wsse:Sec" copy-namespaces="no"/>
</xsl:template>
</xsl:stylesheet>
to get the posted result with a simple copy instruction: https://xsltfiddle.liberty-development.net/bnnZVw
In XSLT 1 the copy will always copy the in-scope namespace xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/" so to remove it from the result you would need to run your code through some kind of transformation stripping in-scope namespaces (other than the one of the element itself):
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:wsse="http://docs.x.org/wsse/"
exclude-result-prefixes="wsse"
version="1.0">
<xsl:template match="#*">
<xsl:attribute name="{name()}" namespace="{namespace-uri()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{name()}" namespace="{namespace-uri()}">
<xsl:apply-templates select="#* | node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="/">
<xsl:apply-templates select="//wsse:Sec"/>
</xsl:template>
</xsl:stylesheet>
https://xsltfiddle.liberty-development.net/bnnZVw/1

Removing specific values XML but keeping tag names using XSLT 1.0

I want to remove specific values from my XML but keep the tag names. I've seen examples that do the opposite (remove tags but keep values). Here is my XML:
<Result>
<Max>100</Max>
<Min>10</Min>
<Range>90</Range>
<ResultPoints>
<ResultP1>.</ResultP1>
<ResultP2>.</ResultP2>
<ResultP3>.</ResultP3>
<ResultP4>.</ResultP4>
<ResultP5>.</ResultP5>
</ResultPoints>
</Result>
I want to remove the '.' but keep the tag names so my XML will look like this:
<Result>
<Max>100</Max>
<Min>10</Min>
<Range>90</Range>
<ResultPoints>
<ResultP1/>
<ResultP2/>
<ResultP3/>
<ResultP4/>
<ResultP5/>
</ResultPoints>
</Result>
Here is my XLT. This completely removes the ResultPn tags.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" omit-xml-declaration="no"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[. = '.']">
<xsl:value-of select="''"/>
</xsl:template>
</xsl:stylesheet>
Any Help will be appreciated!
You just need to do an xsl:copy in your template, to copy across the element you have matched. Note you don't really need to output an empty string here either.
Try this XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" omit-xml-declaration="no"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[. = '.']">
<xsl:copy>
<xsl:apply-templates select="#*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Note, I added an xsl:apply-templates to copy across any existing attributes.
Alternatively, you could replace the second template with this one instead (which matches the text node directly, rather than the parent element)
<xsl:template match="text()[. = '.']" />

get node and value if it's not null using Xslt

I have a requierement as below:
if i give input as :
<?xml version="1.0"?>
<new:NewAddressData xmlns:new="http://www.example.org/NewAddress">
<new:NewStreet></new:NewStreet>
<new:NewArea>Area_1</new:NewArea>
<new:NewState></new:NewState>
</new:NewAddressData>
Output should be:
<new:NewArea>Area_1</new:NewArea>
Actually Iam new bee to XSLT but I read some basics and tried below code :
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:choose>
<xsl:when test="#*|node() != ''">
<xsl:value-of select="." disable-output-escaping="yes" />
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="#*|node()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:copy>
</xsl:template>
for this I am getting output as :
<new:NewAddressData xmlns:new="http://www.example.org/NewAddress">Area_1</new:NewAddressData>
where expected value should be like :
<new:NewArea>Area_1</new:NewArea>
So how can I achieve this using XSLT 1.0
Thanks in advance
You could do something like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="*[text()]">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
Depending on input, like if there was more than one element that contained text, this might result in output that is not well formed.
It looks like you have read about the XSLT Identity Template, which is good!
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
On its own, this will copy across all nodes unchanged (such as your NewArea element), so you need to then write templates for the things you do want to change. In this case, it looks like you want to remove elements that don't have non-empty text nodes as children.
<xsl:template match="*[not(text()[normalize-space()])]">
Try this XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(text()[normalize-space()])]">
<xsl:apply-templates />
</xsl:template>
</xsl:stylesheet>
This would output the following
<new:NewArea xmlns:new="http://www.example.org/NewAddress">Area_1</new:NewArea>
The namespace is necessary here. You can not output an element with a prefix without also declaring the namespace associated with it.

Using xslt to change an attribute value using arithmetic operations

I'm trying to bound the value of an xml attribute using xslt/xpath 1.0. In this example, it would be the id attribute on the m_m element.
<blart>
<m_data>
<m_m name="arggg" id="99999999" subs="asas"/>
</m_data>
<m_data>
<m_m name="arggg" id="99" subs="asas"/>
</m_data>
</blart>
If the id is greater then 20000 it gets set to 20000. I have the following xslt. I know it selects the correct node and attribute. It obviously is just outputing 20000. I realize I should have some sort of xpath logic in there but I'm having a hard time developing it. I have some big holes in my knowledge of xpath and xslt. If you can point me in the right direction in helping me understand on what I should be doing I would really appreciate it.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match ="m_data/m_m/#id[.> 20000]">20000 </xsl:template>
</xsl:stylesheet>
The expected output would be
<blart>
<m_data>
<m_m name="arggg" id="20000" subs="asas"/>
</m_data>
<m_data>
<m_m name="arggg" id="99" subs="asas"/>
</m_data>
</blart>
You can use the following XSLT that gives flexibility to the attribute you want to change, and keeps everything else as it is:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match ="m_data/m_m/#id[. > 20000]">
<xsl:attribute name="id">20000</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
Why don't you try:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match ="m_m/#id[. > 20000]">
<xsl:attribute name="id">20000</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
NOTE: Since I posted this, much better answers were contributed (see here and here). SO won't let me delete this one because it was accepted, but in all fairness and for the sake of quality, I should encourage you to upvote the two aforementioned answers, so that they stand out over this one.
How about this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="m_m">
<m_m>
<xsl:copy-of select="#*" />
<xsl:if test="#id > 20000">
<xsl:attribute name="id">20000</xsl:attribute>
</xsl:if>
</m_m>
</xsl:template>
<xsl:template match="m_data">
<m_data>
<xsl:apply-templates select="m_m" />
</m_data>
</xsl:template>
<xsl:template match="/blart">
<blart>
<xsl:apply-templates select="m_data" />
</blart>
</xsl:template>
</xsl:stylesheet>

Removing Nodes for XML using XSLT - Don't need CRLF against removed nodes/fields

I am using following XSLT to remove empty nodes from the XML. It is working fine by removing the empty nodes but it is also adding CRLF in place of removed nodes.
<xsl:output omit-xml-declaration="yes" method="xml" version="1.0" />
<xsl:template match="*[not(child::node())]"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
I don't need these CRLFs. Please suggest!
It's not adding anything, but it is also not removing the left over whitespace text nodes between the elements in the original XML. If your XML has no mixed content then the simplest approach is to remove all the whitespace-only text nodes and then re-indent the result tree:
<xsl:strip-space elements="*" />
<xsl:output omit-xml-declaration="yes" method="xml" version="1.0" indent="yes" />
If you need to preserve the indentation from the original XML then you need to be slightly more creative
<xsl:template match="*[not(child::node())]"/>
<xsl:template match="text()[not(normalize-space())]
[preceding-sibling::node()[1][self::*][not(child::node())]]" />
The second template will squash whitespace-only text nodes that immediately follow an element that has been squashed by the first template.
It might simply be that e.g.
<root>
<foo></foo>
<foo>2</foo>
</root>
results in
<root>
<foo>2</foo>
</root>
as your code removes the element but not any preceding or trailing white space text node.
In many cases simply doing
<xsl:output omit-xml-declaration="yes" method="xml" version="1.0" indent="yes"/>
<xsl:strip-space elements="*"/>
avoids the problem and gives you nice and consistent indentation.
Or you need to write templates ensuring that a white space text node before or after an empty element is removed as well.
Other solution which worked for me was by using translate. Below is the XSLT:
<xsl:output omit-xml-declaration="yes" method="xml" version="1.0" />
<xsl:template match="*[not(child::node())]"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select="translate(., '
', '')"/>
</xsl:template>