need to display char in xslt - xslt

Hi all
I am using xslt 1.0. I have the char code as FOA7 which has to displayed as a corresponding character. My input is
<w:sym w:font="Wingdings" w:char="F0A7"/>
my xslt template is
<xsl:template match="w:sym">
<xsl:variable name="char" select="#w:char"/>
<span font-family="{#w:fonts}">
<xsl:value-of select="concat('&#x',$char,';')"/>
</span>
</xsl:template>
It showing the error as ERROR: 'A decimal representation must immediately follow the "&#" in a character reference.'
Please help me in fixing this..Thanks in advance...

This isn't possible in (reasonable) XSLT. You can work around it.
Your solution with concat is invalid: XSLT is not just a fancy string-concatenator, it really transforms the conceptual tree. An encoded character such as  is a single character - if you were to somehow include the letters & # x f 0 a 7 ; then the XSLT processor would be required to include these letters in the XML data - not the string! So that means it will escape them.
There's no feature in XSLT 1.0 that permits converting from a number to a character with that codepoint.
In XSLT 2.0, as Michael Kay points out, you can use codepoints-to-string() to achieve this.
There are two solutions. Firstly, you could use disable-output-escaping. This is rather nasty and not portable. Avoid this at all costs if you can - but it will probably work in your transformer, and it's probably the only general, simple solution, so you may not be able to avoid this.
The second solution would be to hardcode matches for each individual character. That's a mess generally, but quite possible if you're dealing with a limited set of possibilities - that depends on your specific problem.
Finally, I'd recommend not solving this problem in XSLT - this is typically something you can do in pre/post processing in another programming environment more appropriately. Most likely, you've an in-memory representation of the XML document to be able to use XSLT in the first place, in which case this won't even take much CPU time.

<span font-family="{#w:font}">
<xsl:value-of select="concat('&#x', #w:char, ';')"
disable-output-escaping="yes"/>
</span>
Though check #Eamon Nerbonne's answer, why you shouldn't do it at all.

If you were using XSLT 2.0 (which you aren't), you could write a function to convert hex to decimal, and then use codepoints-to-string() on the result.

use '&' for '&' in output:
<xsl:value-of select="concat('&#x',$char,';')"/>

Related

Duplicate line and replace string

I have an XML file that contains more than 10,000 items. Each item contains a line like this.
<g:id><![CDATA[FBM00101816_BLACK-L]]></g:id>
For each item I need to add another line below like this:
<sku><![CDATA[FBM00101816]]></sku>
So I need to duplicate each g:id line, replace the g:id with sku and trim the value to delete all characters after the underscore (including it). The final result would be like this:
<g:id><![CDATA[FBM00101816_BLACK-L]]></g:id>
<sku><![CDATA[FBM00101816]]></sku>
Any ideas how to accomplish this?
Thanks in advance.
In XSLT, it's
<xsl:template match="g:id">
<xsl:copy-of select="."/>
<sku><xsl:value-of select="substring-before(., '_')"/></sku>
</xsl:template>
Or using Saxon's Gizmo (https://www.saxonica.com/documentation11/index.html#!gizmo) it's
follow //g:id with <sku>{substring-before(., '_')}</sku>
Don't try to do this sort of thing in a text editor (or any other tool that doesn't involve a real XML parser) unless it's a one-off. Your code will be too sensitive to trivial variations in the way the source XML is written and will almost inevitably have bugs - which might not matter for a one-off, but do matter if it's going to be used repeatedly over a period of time.
Note also, the CDATA tags in your input (and output) are a waste of space. CDATA tags have no significance unless the element content includes special characters like < and &, which isn't the case in your examples.
Okay, so after commenting, I couldn't help myself. This seemed to do what you asked for.
find: <g:id><!\[CDATA\[([^\_]+)?(.+)?\]></g:id>
replace: $0\n<sku><![CDATA[$1]></sku>
I don't have BBEdit, but this is what it looked like in Textmate:

How to (nicely) template match multiple specific child elements (union) within wider XPATH

I'm trying to match a set of particular elements, but only ones which are children of another element structure (let's say it's input or select elements only somewhere inside divs with the class "special-sauce" on them). Normally, this would be easy so far as XPATH: we could parenthetically union the targeted children, like so:
div[contains(#class, 'special-sauce')//(input | select)
But this is where XSLT throws a curve ball, when we try to use this as a template match (at least in Saxon):
<xsl:template match="div[contains(#class, 'special-sauce')//(input | select)">
{"error":"The xsl file (/section-settings.xsl) could not be parsed.
Failed to compile stylesheet. 1 error
detected.","code":"TRANSFORM_ERROR","location":null,"causes":["Fatal
Error: Token \"(\" not allowed here in an XSLT pattern"]}
Basically, parentheticals aren't allowed as part of a template match at the main pathing level (they still work fine inside of conditionals/etc, obviously).
So what to do?
Well, technically, using a union can still work, but we would have to repeat the ancestor XPATH each time, since we can't parenthetically enclose the children:
<xsl:template match="div[contains(#class, 'special-sauce')//input
| div[contains(#class, 'special-sauce')//select">
This is doable (not very pretty, but sure, we can handle that! line breaks can work here to help our sanity yay) in our simple example here, but it gets problematic with more complex XPATH, especially if the parenthetical union would have been in the middle of a longer xpath, or for a lot of elements.
e.g.
div[contains(#class, 'major-mess')]/div[contains(#class, 'special-sauce')]//(dataset | optgroup | fieldset)//(button | option | label)
becomes
a crazy mess.
Ok, that quickly becomes less of an option in more complex examples. And while structuring our XSLT differently might help (intermediary matches, using modality, etc), the question remains:
How can we gracefully template match using unions of individual child elements within a larger XPATH pattern when parentheticals won't work?
An example sheet for the first example:
<div class="special-sauce">
<input class="form-control" type="text" value="" placeholder="INHERITED:" />
<select class="form-control">
<option value="INHERITED: ">INHERIT: </option>
<option value=""></option>
</select>
<div class="radio">
<label>
<input type="radio" name="param3vals" value="INHERITED: " />
INHERIT:
</label>
</div>
</div>
<div class="not-special"><input type="text" id="contact-info-include-path" size="90">
<label>contact</label>
</input></div>
<div class="sad-panda"><input type="text" id="sidenav-include-path" size="90">
<label>sidenav</label>
</input></div>
Note: this does assume that an identity transform is running as the primary method of handling the input document.
While there are other questions which could validly receive similar answers as, for example, the one I give below, I felt the context of those questions was usually more general (such that a top level union would be fine as their answer without complication), more specific in ways that didn't match, or simply too different. Hence the Q&A format.
XSLT 1.0 vs 2.0 vs 3.0
Michael Kay correctly notes in his answer below that while the original pattern attempted here doesn't work in XSLT 1.0 or 2.0, it should work in a (fully) XSLT 3.0 compatible processor. I'm currently on a system using Saxon 9.3, which is technically XSLT 2.0. I just want to call extra attention to that answer for those who are on a 3.0 system.
I looked all over and most answers to similar problems involved copying the repeated portion of the XPATH to each element and unioning it all together. But there is a better way! It's easy to forget that matching a particular element is relatively equivalent to matching that element's name within XPATH.
Use name() or local-name() instead of matching on the element directly within the template match pattern*.
Be aware of your namespace issues/needs when picking which to use. This still allows for advanced conditionals on attributes/etc of those elements.
The first match, for example, becomes:
<xsl:template match="div[contains(#class, 'special-sauce')//
element()[local-name() = ('input', 'select')]">
There's not a huge gain here in terms of space or time to write this out, but we do reduce redundancy and the associated data consistency errors that can result (all too often, especially if later making changes).
Where this really shines is the last example in the question (the mess):
<xsl:template match="div[contains(#class, 'major-mess')]/
div[contains(#class, 'special-sauce')]//
element()[local-name() = ('dataset', 'optgroup', 'fieldset')]//
element()[local-name() = ('button', 'option', 'label')]">
And since I can't remember if that's fully XSLT/XPATH 1.0 compatible by creating the element tree-fragment parenthetically for comparison, if you do need backwards compatibility the "contains() with bracketing separator tokens" (reducing chances of a false positive from another element being a substring of the full name targeted) pattern always works too:
<xsl:template match="div[contains(#class, 'major-mess')]/
div[contains(#class, 'special-sauce')]//
element()[contains('|dataset|optgroup|fieldset|'), concat('|', local-name(), '|'))]//
element()[contains('|button|option|label|', concat('|', local-name(), '|'))]">
* = "match pattern" vs "XPath"
If you're struggling with understanding why the naive approach (the first thing I attempted in the question) fails in XSLT, it helps to understand that template rules like "match" must follow XSLT patterns, which are only essentially a sub set of valid XPath expressions (which easily makes things more confusing to distinguish and remember, especially when many sources just pretend it's all XPath entirely). Note that parentheses only show up as a valid option to use as expression tokens which are only found within expressions within predicates, not within any other portion of the location path or location steps.
Final Considerations
Performance: I have no idea whether there are notable performance differences with this approach versus unioning each seperate element as a full path to each one, or whether there is even a real performance difference between addressing an element natively versus as a predicate on the anonymous element() selector. My suspicion is that while most XSLT processors can probably achieve a faster DOM tree search when a single match is written using the native path structure versus a predicate with name() function on the anonymous selector, the union cases may perform faster depending on how well the processor tries to pre-compile and optimize for logic patterns. I will leave that task for someone else to try benchmarking, because ultimately the real hurdle becomes developer sanity and maintenance issues (likelihood of incurring human errors). In complex matches, I feel that any small performance penalty will likely be easily met by the simple legibility and reduced/eliminated data redundancy of this approach.
I think that your pattern is legal in XSLT 3.0 as written. But I guess you want an XSLT 2.0 solution...
One great way that people often overlook is to use schema-aware patterns. If you want to match a choice of elements, it's quite likely that they are closely related in the schema, for example by having a common type T or by being members of a substitution group S. You can then write
div[contains(#class, 'special-sauce')//schema-element(S)
or
div[contains(#class, 'special-sauce')//element(*, T)
But I guess you want a solution that isn't schema-aware...
In that case, I don't think I can offer anything better than what you've got.
Sometimes multiple modes are the answer: for example something like
<xsl:template match="div[contains(#class, 'special-sauce')]">
<xsl:apply-templates mode="special"/>
</xsl:template>
<xsl:template match="select|input" mode="special">
Generally I think modes are greatly under-used.
Why not split this template into two or three (one for each level) with modes? Something like
<xsl:template match="div[contains(#class, 'special-sauce')">
<xsl:apply-templates select=".//select|input" mode="special-sauce"/>
</xsl:template>
<xsl:template match="select|input" mode="special-sauce">
<!-- ... -->
</xsl:template>
In my opinion this way it reads clearer.

Prevent Narrow Non-Breaking Space (n-nbsp) in XSLT output

I have an XSLT transform that puts   into my output. That is a narrow-non breaking space. Here is one section that results in nnbsp:
<span>
<xsl:text>§ </xsl:text>
<xsl:value-of select="$firstsection"/>
<xsl:text> to </xsl:text>
<xsl:value-of select="$lastsection"/>
</span>
The nnbsp in this case, comes in after the § and after the text to.
<span>§ 1 to 8</span>
(interestingly, the space before the to turns out to be a regular full size space)
This occurs in my UTF-8 encoded output, as well as iso-8859-1 (latin1).
How can I avoid the nnbsp? While the narrow space is visually more appropriate, it doesn't work for all the devices that will read this document. I need a plain vanilla blank space.
Is there a transform setting? I use Saxon 9 at the command line.
Should I do a another transform.. using a replace template to replace the nnbsp?
Should I re-do my templates like the one above? Example, if I did a concat() would that be a better coding practice?
UPDATE: For those who may find this question someday... as suggested by Michael Kay, I researched the issue further. Indeed, it turns out narrow-NBSP were in the source XML files (and bled into my templates via cut/paste). I did not know this, and it was hard to discover (hat tip to gVim hex view). The narrows don't exactly jump out at you in a GUI editor. I have no control over production of the source XML, so I had to find a way to 'deal with it.' Eric's answer below turned out to be my preferred way to scrub the narrow-nbsp. SED editing was (and is) an another option to consider, but I like keeping my production in XSLT when possible. So Eric's suggestion has worked well for me.
You could use the translate() function to replace your nnbsp by something else, but since you are using Saxon 9 you can rely on XSLT 2.0 features and use a character map which will do that kind of things automatically for you, for instance (assuming that you want to replace them by a non breaking space:
<xsl:output use-character-maps="nnbsp"/>
<xsl:character-map name="nnbsp">
<xsl:output-character character=" " string=" "/>
</xsl:character-map>
Eric
The narrow non-breaking space is coming from somewhere: either the source document or the stylesheet. It's not being magically injected by the XSLT processor. If it's in the stylesheet, then get rid of it. If it's in the source document, then transform it away, for example by use of the translate() function.
In fact, pasting your code fragment into a text editor and looking at it in hex, I see that the 202F characters are right there in your code. I don't know how you got them into your stylesheet, but you should (a) remove them, and (b) work out how it happened so it doesn't happen again.

Preserve character hex codes during XSLT 2.0 transform

I have the following XML:
<root>
<child value="ÿï™à"/>
</root>
When I do a transform I want the character hex code values to be preserved. So if my transform was just a simple xsl:copy and the input was the above XML, then the output should be identical to the input.
I have read about the saxon:character-representation function, but right now I'm using Saxon-HE 9.4, so that function is not available to me, and I'm not even 100% sure it would do what I want.
I also read about use-character-maps. This seems to solve my problem, but I would rather not add a giant map to my transform to catch every possible character hex code.
<xsl:character-map name="characterMap">
<xsl:output-character character=" " string="&#xA0;"/>
<xsl:output-character character="¡" string="&#xA1;"/>
<!-- 93 more entries... ¡ through þ -->
<xsl:output-character character="ÿ" string="&#xFF;"/>
</xsl:character-map>
Are there any other ways to preserve character hex codes?
The XSLT processor doesn't know how the character was represented in the input - that's all handled by the XML parser. So it can't reproduce the original.
If you want to output all non-ASCII characters using numeric character references, regardless how they were represented in the input, try using xsl:output encoding="us-ascii".
If you really need to retain the original representation - and I can't see any defensible reason why anyone would need to do that - then try Andrew Welch's lexev, which converts all the entity and character references to processing instructions on the way in, and back to entity/character references on the way out.

XSLT: keeping whitespaces when copying attributes

I'm trying to sort Microsoft Visual Studio's vcproj so that a diff would show something meaningful after e.g. deleting a file from a project. Besides the sorting, I want to keep everything intact, including whitespaces. The input looks like
space<File
spacespaceRelativePath="filename"
spacespace>
...
The xslt fragment below can add the spaces around elements, but I can't find out how to deal with those around attributes, so my output looks like
space<File RelativePath="filename">
xslt I use for the msxsl 4.0 processor:
<xsl:for-each select="File">
<xsl:sort select="#RelativePath"/>
<xsl:value-of select="preceding-sibling::text()[1]"/>
<xsl:copy>
<xsl:for-each select="text()|#*">
<xsl:copy/>
</xsl:for-each>
Those spaces are always insignificant in XML, and I believe that there is no option to control this behavior in a general way for any XML/XSLT library.
XSLT works on a tree representation of the input XML. Many of the irrelevant detail of the original XML has been abstracted away in this tree - for example the order of attributes, insignificant whitespace between attributes, or the distinction between " and ' as an attribute delimiter. I can't see any conceivable reason for wanting to write a program that treats these distinctions as significant.