How to generate markup inside of <xsl:attribute> text? - xslt

My XSLT stylesheet generates Bootstrap HTML where some elements may contain data-... attributes to pass additional data to the framework. For example, I have this code to generate a popover element:
<xsl:template match="foo">
<a href="#" data-toggle="popover" data-placement="top" data-trigger="hover" data-html="true">
<xsl:attribute name="title">Popover Title</xsl:attribute>
<xsl:attribute name="data-content">This is some additional content.</xsl:attribute>
<xsl:text>Link</xsl:text>
</a>
</xsl:template>
The data-content attribute is supposed to contain additional markup. The resulting output should be something like
Link
How do I generate markup text for the <xsl:attribute> in this case?
(Somewhat related: here and here and here.)
The answers
Thanks for the answers! While I think that kjhughes's answer provides the technically correct solution to implement properly what I need, I think that Ian's answer addresses my question more directly.

You can't put unescaped markup in an attribute value, but you don't need to - if you escape the angle brackets (and any ampersands and quotes-within-quotes) as entity references bootstrap will still render the html properly in the popover.
Link
The simplest way to get this right in the XSLT would be to use a CDATA section:
<xsl:attribute name="data-content"
><![CDATA[This is <em>some</em> additional content
& a link.]]></xsl:attribute>
And the serializer will escape it for you as necessary.

You cannot generate markup inside of an xsl:attribute because XML does not allow unescaped markup inside of attributes.
Specifically, the grammar rule for AttValue prohibits < altogether and & unless the & is part of a properly formed Reference:
AttValue ::= '"' ([^<&"] | Reference)* '"'
| "'" ([^<&'] | Reference)* "'"
Supporting definitions:
Reference ::= EntityRef | CharRef
EntityRef ::= '&' Name ';'
CharRef ::= '&#' [0-9]+ ';'
| '&#x' [0-9a-fA-F]+ ';'
HTML too, even HTML5, does not allow unescaped markup in attribute values.
Adding escaped markup in attribute values is viable but ugly. Especially for heavily marked-up content, I'd recommend creating the content separately and then associating it procedurally via JavaScript rather than declaratively via attribute values. There are many examples of doing this, including this one mentioned in one of your references.

Not an "official" way of doing this, but when using lxml to process the XML and XSLT stylesheets, consider using XSLT extension elements in your stylesheet. Those custom elements allow you to run Python code when elements match during processing, and that code can modify/edit parts of the output document.

Related

Is it possible to put regex directly into XML content?

I have a XML file I use to manually route users to specific pages in a website.
Currently, we have separate entries for every variation of possible searches (plural, typos etc.). I would like to know if there is a way I can condense it with regex to something like so:
<OnSiteSearch>
...
<Search>
<SearchTerm>(horses?|cows?) for sale</SearchTerm>
<Destination>~/some/path.html</Destination>
</Search>
...
</OnSiteSearch>
Is something like this possible? I've looked online for regex and XML but it seems to be about validating content between the XML tags and not about using regex as the content.
Yes, a regex can be stored in XML as long as you mind XML escaping rules to keep the XML well-formed:
Element content: Escape any < as < and any & as & when writing
the regex; reverse the substitution before using the regex.
Attribute value: Follow rules for element content plus escape any " as
&quote; or any ' as &apos; to avoid conflict with chosen attribute
value delimiters.
CDATA: No escaping needed, but make sure your regex doesn't include
the string ]]>.

RegEx for mining XML tag content

Fellow Forum Members,
I am using the latest NotePad++. I have 430 separate XML files and my goal is to make a "dmcode" list of all 430 XML files. The dmcode identifies each XML file and looks like the example code shown below. I need help in developing a Regular Expression that will grab the dmcode tag content located between the <dmCode opening tag and the closing /> terminator. Also I only need this extraction to only apply to dmcode tags that follows the <dmIdent> tag. In other words, any dmcode tag that is not preceded by a <dmIdent> tag does not end up on my NotePad++ search result list. Is such a Regular Expression that can pull targeted data from a lot of XML files possible?
<dmIdent>
<dmCode assyCode="00" disassyCode="00" disassyCodeVariant="00" infoCode="042" infoCodeVariant="A" itemLocationCode="O" modelIdentCode="SASA" subSubSystemCode="6" subSystemCode="0" systemCode="A03" systemDiffCode="XY"/>
As an alternative I have been researching using an XPath expression to accomplish the same task. However, I can't seem to find a NotePad++ XPath plugin that will enable me to specify the data I want to extract from 430 XML files by using an XPath expression instead of a Regular Expression. I will also appreciate it if anyone can provide an example of an XPath expression that will perform the same task I'm trying to accomplish by using a Regular Expression.
Any help will be greatly appreciated.
I know there are plugins for XPath, but I don't know one that allows you to search several files. The following XPath would match all attributes in <dmCode> as a child of the root element <dmIdent>:
/dmIdent/dmCode[#*]
I need help in developing a Regular Expression that will grab the dmcode tag content located between the <dmCode opening tag and the closing /> terminator. Also I only need this extraction to only apply to dmCode tags that follow the <dmIdent> tag.
This will work for the most simple cases, where:
<dmCode> is the first child of <dmIdent>
There are no comments, CDATA tags, or similar constructs that could make it fail.
(?i)<dmIdent>\s*<dmCode \K[^"/>]*(?>(?:"[^\\"]*(?:\\.[^\\"]*)*"|/(?!>))[^"/>]*)*(?=/>)
regex101 demo
Matches:
(?i)<dmIdent>\s*<dmCode both tags spearated by whitespace (case-insensitively)
\K resets the matched text
[^"/>]* Any characters except ", / or >
And loops:
"[^\\"]*(?:\\.[^\\"]*)*" text in quotes, or
/(?!>) a / not followed by >
both followed by the previous [^"/>]*
(?=/>) All followed by />

XSLT with value of select in URL string

I want to combine some XSL with XML and put out the resulting HTML.
My XSl contains this line which doesnt work:
Click here
The desired output would be:
Click here
The code works when I leave out the <xsl:value-of select="row/objectid"/> part in the URL. It also works when I place the <xsl:value-of select="row/objectid"/> outside the hyperlink tag, so i KNOW the value-of-select to be correct by itself.
So I suspect that the quotes are messing things up...how can I fix this?
PS. I tried replacing " with ' as well
Your stylesheet should contain well-formed XML, so you can't include the output from value-of in an attribute. Use an attribute value template instead:
<a href="www.domain.com/account/business/get/?t=2&id={row/objectid}"
>Click here</a>
The expression in curly braces will be evaluated and replaced with its output.

Xslt: Embedding an image in a RSS feed

I am using Umbraco and I need to display an image in a Rss Feed. The feed is generated by Xslt.
Everything works if I do text stuff. Such stuff is technically feasible, but the feed I analyzed had been generated by WordPress.
The challenge is that I have no idea how I can embed within my tag.
I have a variable, say "url", that returns the full url of the underlying image. How can I insert within ? Remember I am using Xslt to achieve the task.
<content:encoded>
<img src="{$url}" />
</content:encoded>
I guess that CDATA must be used, but I am not able to escape correctly illegal characters :(
Thanks for your help.
Roland
roland, you're trying to escape things twice. It's unnecessary (not to mention hideous!) This page shows:
<content:encoded><![CDATA[This is <i>italics</i>.]]></content:encoded>
I.e. they're just escaping the markup inside the <content:encoded> once, and they use CDATA to do that. In your case, CDATA is awkward because you need to substitute $url in the middle. So you could use two CDATA sections wrapped around an <xsl:value-of select="$url" />: (indented for clarity)
<content:encoded>
<![CDATA[<img src="]]>
<xsl:value-of select='$url' />
<![CDATA[">]]>
</content:encoded>
But that would be needlessly verbose. The second CDATA section is unneeded. And we can do better while using the same principle: escape the markup characters (once) that would cause the string to be parsed into a tree. In your case, only the initial < needs to be escaped. You can use < instead of CDATA to escape the <. Put this in your XSLT:
<content:encoded><img src="<xsl:value-of select='$url' />"></content:encoded>
The <xsl:value-of> is not really inside quotes, from XSLT's perspective... those quotes are just the content of text nodes. The <xsl:value-of> works as a normal XSLT instruction.
Change select='$url' to select="concat($siteUrl, photo)" if that's what you need. (I.e. photo is a child element of the context node, and its text value is the name of the image file.)

apostrophe text comparison in xsl

I have a problem with text with apostrophe symbol
example i try to test this xml having the symbol is then how can i compare ?
<xsl:for each select="country[nation='India's]">
this is statement showing error
Regards
Nanda.A
One way to do it would be:
<xsl:variable name="apos" select='"&apos;"'/>
<!-- ... later ... -->
<xsl:for-each select="country[nation=concat('India', $apos, 's')]">
The problem here is twofold:
XSLT defines no way of character escaping in strings. So 'India\'s' is not an option.
You must get through two distinct layers of evaluation.
These are:
XML well-formedness: The XML document your XSLT program consists of must be well-formed. You cannot violate XML rules.
XSLT expression parsing: The resulting attribute value string (after XML DOM parsing is done) must be make sense to the XSLT engine.
Constructs like:
<xsl:for-each select="country[nation='India's']">
<xsl:for-each select="country[nation='India&apos;s']">
pass the XML layer but violate the XSLT layer, because in both cases the effective attribute value (as stored in the DOM) is country[nation='India's'], which clearly is an XPath syntax error.
Constructs like:
<xsl:for-each select="country[nation=concat('India', "'", 's')]">
<xsl:for-each select="country[nation=concat("India", "&apos;", "s")]">
clearly violate the XML layer. But they would not violate the XSLT layer (!), since their actual value (if the XSLT document could be parsed in the first place) would come out as country[nation=concat('India', "'", 's')], which is perfectly legal as an XPath expression.
So you must find a way to pass through both layer 1 and layer 2. One way is the variable way as shown above. Another way is:
<xsl:for-each select="country[nation=concat('India', "'", 's')]">
which would appear to XSLT as country[nation=concat('India', "'", 's')].
Personally, I find the "variable way" easier to work with.