XSLT whitespace elements getting trimmed

XSLT whitespace elements getting trimmed - xslt

Quick question. I have some XML
<someXML>
<someNode> </someNode>
<someNode>asdlkjf </someNode>
</someXML>
When i apply an XSLT to this the first node is getting trimmed into nothing. The second is fine and the trailing whitespace is not trimmed because i'm preserving whitespace:
<someXML>
<someNode></someNode>
<someNode>asdlkjf </someNode>
</someXML>
My question is why is the first node getting truncated? As absurd as this sounds, the whitespace node is important and needs to be maintained. I'm using Xalan 2.7.1. Is this just the way XSL works or is there a way around this?
Thanks!

Use this XSLT directive:
<xsl:preserve-space elements="*"/>
If this doesn't help, this means that the XML parser is stripping the whitespace-only text nodes.
To prevent this, in the XML document use the xml:space="preserve" attribute:
<someXML>
<someNode xml:space="preserve"> </someNode>
<someNode>asdlkjf </someNode>
</someXML>

Related

How to avoid Open xml tag on its own line

I have an XML structure as below:
<?xml version="1.0" encoding="utf-8"?>
<cl:doc identifier="ISBN" xsi:schemaLocation="http://xml.cengage-learning.com/cendoc-core cendoc.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:cl="http://xml.cengage-learning.com/cendoc-core" xmlns:m="http://www.w3.org/1998/Math/MathML">
<cl:chapter identifier="ch01">
<cl:opener identifier="ch06_opn">
<cl:introduction identifier="ch06_int">
<cl:list identifier="tu_1" list-style="Unformatted" item-length="long">
<cl:item identifier="tu_2"><cl:para identifier="ch01_dum_2">Solubility</cl:para></cl:item>
<cl:item identifier="tu_3"><cl:para identifier="ch01_dum_3">Polarity</cl:para></cl:item>
</cl:list></cl:introduction></cl:opener></cl:chapter></cl:doc>
When I transform this above xml using XSLT, I got the below output:
<?xml version="1.0" encoding="utf-8"?>
<cl:doc identifier="ISBN" xsi:schemaLocation="http://xml.cengage-learning.com/cendoc-core cendoc.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:cl="http://xml.cengage-learning.com/cendoc-core" xmlns:m="http://www.w3.org/1998/Math/MathML"><cl:chapter identifier="ch01">
<cl:opener identifier="ch06_opn">
<cl:introduction identifier="ch06_int"><cl:list identifier="tu_1" list-style="Unformatted" item-length="long">
<cl:item identifier="tu_2"><cl:para identifier="ch01_dum_2">Solubility</cl:para></cl:item>
<cl:item identifier="tu_3"><cl:para identifier="ch01_dum_3">Polarity</cl:para></cl:item></cl:list></cl:introduction></cl:opener></cl:chapter></cl:doc>
Here, the opening tag <cl:opener identifier="ch06_opn"> alone comes on separate line. This result me to have the blank line after doing the conversion.
I need this <cl:opener identifier="ch06_opn"> tag must be run-on with either its previous line or to the next line.
Can anybody help me how this can be achieved through XSLT.
Thanks,
Gopal

Without seeing your XSLT it's difficult to be certain, but it sounds like your XSLT is copying over the whitespace in the source into the output.
The quickest way to prevent that is to put
<xsl:strip-space elements="*"/>
or alternatively
<xsl:template match="text()[not(normalize-space())]"/>
This removes all whitespace, but you can of course be more specific about the whitespace you're removing, such as
<xsl:template match="cl:opener/text()[1][not(normalize-space())]"/>
to remove just the whitespace after that opening element tag- this matches the first text node within cl:opener if it's whitespace only, and outputs nothing in it's place.

Search & replace regex - filtering files

little bit of background:
I work at a multilingual communication company, where we’re working with a CMS system. Since its last update, all the files I export out of the system are ‘polluted’ with metadata, which I don't want to see, use or replace. To filter and change a heap of xml files, I use Powergrep, which operates with regexes.
I want my regex to find, e.g. "there is no spoon", "oracle", "I know kung-fu" and "bending method" (all straight quotation marks) and replace it with “there is no spoon”, “oracle”, “I know kung-fu” and “bending method” (all with curly quotation marks).
I don’t want it to find the metadata "concept.dtd" and "map.dtd"
The following lines are the first lines of my xml file. It's this "concept.dtd" that I would like to ignore.
<?xml version="1.0" encoding="UTF-16" standalone="no"?>
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"[
]>
<?ish ishref="GUID-6B84EF92-DA99-4C54-BA91-FD0A113D4A96" version="1" lang="sv" srclng="en"?>
This is somewhere in the middle of the xml file
<row>
<entry colname="col1" valign="middle" align="left">"Bending method" </entry>
<entry colname="col2" valign="middle" align="left">another word</entry>
</row>
So.. this is the original regex:
(?<!=)”\b(.+?)\b”(?! \[)
Replacement:
“1”
Problem:
As the metadata “concept.dtd” and “map.dtd” are part of the file, I don’t want to replace their quotation marks in order not to change anything crucial. So I tried rewriting the regex:
(?<!=)”\b(.+?[\.d])\b”(?! \[)
It almost works: “concept.dtd” and “map.dtd” are skipped, most of the terms between quotation marks are found, but not all: “Bending method” is not found, for example.
What am I missing? Any help or opinions would be greatly appreciated!

Based on your last answers, here is a regexp that can help you:
(?<=<entry)[^>]+>[^<>]*?(".+?")[^<>]*?(?=<\x2Fentry>)
Description
Demo
http://regex101.com/r/lX2cU3
Discussion
I assume that you have one serie of words between straight quotations and that there are no carriage returns ou line feeds inside an <entry> node.

how to get text with xpath from bad xml?

I have a "bad xml structure" file:
<cars>
<car>Toyota
<country>Japan</coutry>
....
</car>
</cars>
How to correctly get the right word (Toyota) using Xpath?
I tried:
<xsl:value-of select = "cars/car/text()"/>.
It works, but I think there are more appropriate methods.
Thanks.

Use:
/cars/car/text()[1]
or if you want to discard most of the white space in the text node selected above, use:
normalize-space(/cars/car/text()[1])
Do note that while in XSLT 1.0 <xsl:value-of> outputs the string valu only of the first node of the node-set selected by the expression in the select attribute, <xsl:copy-of> will output all the nodes in the node-set. In XSLT 2.0 even <xsl:value-of> outputs all the nodes in the node-set.
Therefore, for purposes of portability, upgradability and simply for avoiding errors, it is better to specify which exactlyy node from the nodeset is to be output -- even when using <xsl:value-of>

Trying to match more than one class in XSLT

I'm very new to XSLT and trying to format some text for pdf's and I need to match and hide a few elements.
I am currently using:
<xsl:template match="*[#outputclass='LC ACaseName']">
to match:
<p outputclass="LC ACaseName">
and it works just fine.
What I now need to do is match 4 or 5 more
<p outputclass="<somestring>">
and apply the same style to them. I could easily just duplicate the above line substituting the different outputclass names each time but this is lazy and I know there must be a correct way of doing this which I should learn.
I hope I have provided enough info here. If I have missed anything please say.
thanks,
Hedley Phillips

You can specify multiple conditions in the predicate:
<xsl:template match="*[#outputclass='test' or #outputclass='blah']">

I couldn't find the duplicate...
In XSLT/XPath 1.0:
<xsl:template match="*[contains(
'|LC ACaseName|other class|',
concat('|',#outputclass,'|')
)
]">
<!-- Content Template -->
<xsl:template>
In XSLT/XPath 2.0:
<xsl:template match="*[#outputclass = ('LC ACaseName','other class')]">
<!-- Content Template -->
<xsl:template>
Note: For XSLT/XPath 1.0 solution you need a separator not being part of any item content.

XSL disable-output-escaping removes whitespaces

Part of the XML:
<text><b>Title</b> <b>Happy</b></text>
In my XSL I have:
<xsl:value-of select="text" disable-output-escaping="yes" />
My output becomes
**TitleHappy**
My spacing went missing - there's supposed to be a space between </b> and <b>.
I tried normalize-space(), it doesn't work.
Any suggestions? Thanks!

if you want whitespace from an xsl, use:
<xsl:text> </xsl:text>
whitespace is only preserved if its recognized as a text node (ie: " a " both spaces will be recognized)
whitespace from the orignal source xml has to be preserved by telling the parser (for example)
parser.setPreserveWhitespace(true);

As your outputting HTML you could substitute your space with a non-breaking space

Do you have any control over the generation of the original XML? If so, you could try adding xml:space="preserve" to the text element which should tell the processor to keep the whitespace.
<text xml:space="preserve"><b>Title</b> <b>Happy</b></text>
Alternatively, try looking at the "xsl:preserve-space" element in XSLT.
<xsl:preserve-space elements="text"/>
Although I have never used this personally, it might of some help. See W3Schools for more information.

thank you for everyone's input.
Currently I am using MattH suggestion which is to test for space and substitue to non-breaking space. Another method I thought of is to test for "</b> <b>" and substitue with " </b><b>". The space contain within a bold tags are actually output. Both methods worked. Don't know what the implications are though. And I still can't figure out why the spacing is removed when it is found between 2 seperate bold tags.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

XSLT whitespace elements getting trimmed - xslt

Related

How to avoid Open xml tag on its own line

Search & replace regex - filtering files

how to get text with xpath from bad xml?

Trying to match more than one class in XSLT

XSL disable-output-escaping removes whitespaces

Categories

Resources