XSLT - Select content between two special characters - regex

I have a xml like this,
<doc>
<p>text1 <xml version="1.0" encoding="UTF-16"
standalone="yes"?> text2</p>
</doc>
I need to remove the text content between < and > form above text using XSLT. So expected output is,
<doc>
<p>text1 text2</p>
</doc>
I tried to use regex but I'm wondering how I can catch text between < and > form regex.
Any idea how I can do this using XSLT?

This Should Work.
(<(?:.?\n?)*>)
Then Replace with "" (empty)
Input:
<doc>
<p>text1 <xml version="1.0" encoding="UTF-16"
standalone="yes"?> text2</p>
</doc>
Output:
<doc>
<p>text1 text2</p>
</doc>
See: https://regex101.com/r/0o9hol/1

Using just XSLT-1.0 you can achieve this by applying the following template:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" />
<xsl:template match="p">
<xsl:value-of select="concat(normalize-space(substring-before(text(), '<')),' ',normalize-space(substring-after(text(), '>')))" />
</xsl:template>
<!-- identity template -->
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This template just copies all nodes with the identity template and applies a special treatment to all <p> elements.
The special treatment of the <p> nodes extracts the text() nodes before < and after > while normalizing the space character occurrence(reducing their count to one) and concatenates the result.
That's all.

Related

How to check contain only characters + space and `p` using regex

I want to check to contain only characters + space and <p> nodes inside <used>.
Input:
<root>
<used><p>String 1</p></used>
<used>string 2<p>string 3</p></used>
<used>string 4</used>
<used><image>aaa.jpg</image>para</used>
The output should be:
<ans>
<abc>string 1</abc>
<abc>string 4</abc>
</ans>
Tried code:
<ans>
<abc>
<xsl:template match="root">
<xsl:choose>
<xsl: when test="getCode/matches(text(),'^[a-zA-Z0-9]+$')">
<xsl:text>text()</xsl:text>
</xsl:when>
</xsl:choose>
</xsl:template>
</abc>
</ans>
My tried code is not working as I am expecting. How can I fix this? Thank you. I am using XSLT 2.0
You can use the following XSLT-2.0 stylesheet to get the desired result:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl= "http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<!-- Handle the <root> element -->
<xsl:template match="/root">
<ans>
<xsl:apply-templates select="used" />
</ans>
</xsl:template>
<!-- Create <abc> elements for every matching element -->
<xsl:template match="used[not(*) and matches(text(),'^[\sa-zA-Z0-9]+$')] | used[not(text()) and matches(p/text(),'^[\sa-zA-Z0-9]+$')]/p">
<abc><xsl:copy-of select="text()" /></abc>
</xsl:template>
<!-- Remove all spurious text nodes -->
<xsl:template match="text()" />
</xsl:stylesheet>
Its result is
<?xml version="1.0" encoding="UTF-8"?>
<ans>
<abc>String 1</abc>
<abc>string 4</abc>
</ans>

Replace xsi:nil=“true” with open and close tags

I need to do the following transformation in order to get a message pass through a integration broker which does not understand xsi:nil=“true”. I know that for strings having some thing like <abc></abc> is not same as <abc xsi:nil=“true”/> but I have no option.
My input XML:
<PART>
<LENGTH_UOM xsi:nil="1"/>
<WIDTH xsi:nil="1"/>
</PART>
Expected outcome:
<PART>
<LENGTH_UOM><LENGTH_UOM>
<WIDTH></WIDTH>
</PART>
Please let me know your suggestions.
To remove all xsi:nil attributes combine the identity template with an empty template matching xsi:nil.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsi="http://xsi.com">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="node()|#*"> <!-- identity template -->
<xsl:copy>
<xsl:apply-templates select="node()|#*" />
</xsl:copy>
</xsl:template>
<xsl:template match="#xsi:nil" /> <!-- empty template -->
</xsl:stylesheet>
If you only want to remove those whose value is true use the following empty template instead.
<xsl:template match="#xsi:nil[.='1' or .='true']" />
Concerning the opening and closing tag topic I suggest reading this SO question in which Martin Honnen states that (in the comments of the answer):
I am afraid whether an empty element is marked up as or or is not something that matters with XML and is usually not something you can control with XSLT processors.

Transforming xml with namespaces using XSLT

I have the following xml
<?xml version="1.0" encoding="UTF-8"?>
<typeNames xmlns="http://www.dsttechnologies.com/awd/rest/v1" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<typeName recordType="case" href="awdServer/awd/services/v1/businessareas/SAMPLEBA/types/SAMPLECASE">SAMPLECASE</typeName>
<typeName recordType="folder" href="awdServer/awd/services/v1/businessareas/SAMPLEBA/types/SAMPLEFLD">SAMPLEFLD</typeName>
<typeName recordType="source" href="awdServer/awd/services/v1/businessareas/SAMPLEBA/types/SAMPLEST">SAMPLEST</typeName>
<typeName recordType="transaction" href="awdServer/awd/services/v1/businessareas/SAMPLEBA/types/SAMPLEWT">SAMPLEWT</typeName>
</typeNames>
I want to transform above xml as below by using XSLT:
<response>
<results>
<source>
SAMPLEST
</source>
</results>
</response>
</xsl:template>
I just want to get the source from the input xml to the output xml.
I am trying with the following xml, but couldn't get the required output xml:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:v="http://www.dsttechnologies.com/awd/rest/v1" version="2.0" exclude-result-prefixes="v">
<xsl:output method="xml" version="1.0" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*" />
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="typeNames">
<response>
<results>
<source>
<xsl:value-of select="source" />
</source>
</results>
</response>
</xsl:template>
</xsl:stylesheet>
I. Namespace in input xml
<typeNames xmlns="http://www.dsttechnologies.com/awd/rest/v1"...
xmlns puts self + all child nodes into a namespace. This namespace does not need any prefix.
II. Namespace in XSLT
... xmlns:v="http://www.dsttechnologies.com/awd/rest/v1"...
You prefixed the namespace (same uri as source) with v, so you have to write this prefix in your xpath as well.
<xsl:template match="v:typeNames">
[XSLT 2.0: you also can add xpath-default-namespace="uri" in the stylesheet section, to define a default namespace for all xpath-expressions. Therefore you dont have to prefix the namespace.]
III. Guessing on given input xml
<xsl:value-of select="source" /> -> <typeName recordType="source"..>SAMPLEST</typeName>
If you want to select the shown xml-node, you have to write one of the following:
absolute, without any context node:
/v:typeNames/v:typeName[#recordType = 'source']
on context-node typeNames:
v:typeName[#recordType = 'source']
[<xsl:value-of select="..."/> will return the text-node(s), e.g. "SAMPLEST"]
EDIT:
What if there are two tags.
First things first: <xsl:value-of in XSLT 1 can only work with 1 node! If the xpath expression matches more than one node, it will just process the first one!
Solve it like this way:
...
<results>
<xsl:apply-templates select="v:typeName[#recordType = 'source']"/>
</results>
...
<xsl:template match="v:typeName[#recordType = 'source']">
<source>
<xsl:value-of select="."/>
</source>
</xsl:template>
The apply-templates within results searches for all typeName..source. The matching template listens to that node and creates the xml <source>....

exclude empty elements

in my sample xml file, i have this:
<AAA mandatory = "true"> good </AAA>
<BBB mandatory = "true"></BBB>
<CCC />
in the resulting xml, the result should be like this:
<AAA> good </AAA>
<BBB></BBB>
what should i put in my transformation file xslt to produce this xml?
currently, i have this:
<xsl:template match="node()[(#mandatory='true' or (following-sibling::*[#mandatory='true' and string-length(normalize-space(.)) > 0] or preceding-sibling::*[#mandatory='true' and string-length(normalize-space(.)) > 0])) or descendant-or-self::*[string-length(normalize-space(.)) > 0]]">
but this keeps displaying
<CCC />
When I run your XSLT on the input XML I do not get any output.
Your provided XML is not well formed and the XPATH in your "match" is too complicated I think.
I came up with a XSL 1.0 solution but I do not know if you can use that in XSL 2.0. I do not have experience with XSL 2.0.
This XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<list>
<xsl:apply-templates/>
</list>
</xsl:template>
<xsl:template match="*[#mandatory='true']">
<xsl:copy>
<xsl:apply-templates />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
applied to this input XML:
<?xml version="1.0" encoding="UTF-8"?>
<list>
<AAA mandatory="true"> good </AAA>
<BBB mandatory="true"/>
<CCC/>
</list>
gives this output XML:
<?xml version="1.0" encoding="UTF-8"?>
<list>
<AAA> good </AAA>
<BBB/>
</list>
I am not sure if you also want to check on the text length of an element or only on the attribute mandatory. I only check on the attribute in my XSL.

Embedding static CDATA with its tag in XSLT

I need to output from the XSL a static CDATA construct embedded in the XSL, not from the XML that I am transforming. For example...
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="xml" indent="yes"/>
<!-- ================================================== -->
<xsl:template match="/">
<Document>
<text><![CDATA[
<b>static</b>
<br/><br/>
text
<br/><br/>
]]>
</text>
<xsl:apply-templates select="//tag"/>
</Document>
</xsl:template>
<!-- ================================================== -->
<xsl:template match="tag">
So on and so forth...
</xsl:template>
<!-- ================================================== -->
</xsl:stylesheet>
I want this to output...
<?xml version="1.0"?>
<Document>
<text><![CDATA[
<b>static</b>
<br/><br/>
text
<br/><br/>
]]>
</text>
So on and so forth...
</Document>
But what I get is...
<?xml version="1.0"?>
<Document>
<text>
<b>static</b>
<br/><br/>
text
<br/><br/>
</text>
So on and so forth...
</Document>
I've tried several combinations of escaping the text and entities, but none seem to work.
Use
<xsl:output cdata-section-elements="text" />
to enforce CDATA for certain elements (spec).
In any case, what you currently get is equivalent to a CDATA section and it should not bother you. (i.e.: If it's bothering you for optical reasons, then get over it. If it is bothering you for technical reasons, fix them.)