Substring before throwing error - xslt

I've the below XML
<?xml version="1.0" encoding="UTF-8"?>
<body>
<p>Industrial drawing: Any creative composition</p>
<p>Industrial drawing: Any creative<fn>
<fnn>4</fnn>
<fnt>
<p>ftn1"</p>
</fnt>
</fn> composition
</p>
</body>
and the below XSL.
<xsl:template match="p">
<xsl:choose>
<xsl:when test="contains(substring-before(./text(),' '),'Article')">
<xsl:text>sect3</xsl:text>
<xsl:value-of select="./text()"/>
</xsl:when>
<xsl:when test="contains(substring-before(./b/text(),' '),'Section')">
<xsl:text> Sect 2</xsl:text>
<xsl:value-of select="./text()"/>
</xsl:when>
<xsl:when test="contains(substring-before(./b/text(),' '),'Chapter')">
<xsl:text> Sect 1</xsl:text>
<xsl:value-of select="./text()"/>
</xsl:when>
<xsl:otherwise>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
Here my XSL is working fine for <p>Industrial drawing: Any creative composition</p> but for the below Case
<p>Industrial drawing: Any creative<fn>
<fnn>4</fnn>
<fnt>
<p>ftn1"</p>
</fnt>
</fn> composition
</p>
it is throwing me the below error.
XSLT 2.0 Debugging Error: Error: file:///C:/Users/u0138039/Desktop/Proview/ASAK/DIFC/XSLT/tabel.xslt:38: Wrong occurrence to match required sequence type - Details: - XPTY0004: The supplied sequence ('2' item(s)) has the wrong occurrence to match the sequence type xs:string ('zero or one')
please let me know how can i fix this and grab the text required.
Thanks

The second p element in your example XML has two child text nodes, one containing "Industrial drawing: Any creative" and the other containing a space, "composition", a newline and another six spaces. In XSLT 1.0 it is legal to apply a function that expects a string to an argument that is a set of more than one node, the behaviour is to take the value of the first node and ignore all the others. But in 2.0 it is a type mismatch error to pass two nodes to a function that expected a single value for its parameter.
But in this case I doubt that you really need to use text() at all - if all you care about is seeing whether the string "Article" occurs anywhere within the first word under the p (including when this is nested inside another element) then you can simply use .:
<xsl:when test="contains(substring-before(.,' '),'Article')">
(or better still, use predicates to separate the different conditions into their own templates, with one template matching "Article" paragraphs, another matching "Section" paragraphs, etc.)

The p element in your example has several text nodes, so the expression ./text() creates a sequence. You cannot apply a string function to a sequence; you must convert it to a string first. Instead of:
test="contains(substring-before(./text(),' '),'Article')"
try:
test="contains(substring-before(string-join(text(), ''), ' '), 'Article')"

Related

Copy text and replace character in XSL

I'm transforming a DITA document to a simplified, formatting-based XML to be used as an import into Adobe InDesign. My transformation is going really well, except for one element which omits the text in the output. The element is codeblock. When I don't have a template specifying it at all, the element and any child elements are passed through to the new XML document, but none of the text is passed through. This element should be passed through with text and child elements like every other element in my document for which a specific template is not defined. There's nothing anywhere else in the XSL stylesheet that specifies codeblock or any of its attributes. I am completely stumped and cannot figure out what's going on here.
It is also worth noting that a number of inline elements (cmdname, parmname, userinput, etc.) are converted to bold on output. The downstream XML is for formatting and does not need to know semantic context.
This is what I'm trying to pass through:
<codeblock>This is the first line of my code block.
This is my second line to prove that line feeds are preserved.
This line proves that <parmname>child elements</parmname> are passed through.</codeblock>
With no template defined for codeblock, this is what I get as a result:
<codeblock><bold/></codeblock>
The actual result I want is:
<codeblock>This is the first line of my code block.
This is my second line to prove that line feeds are preserved.
This line proves that <bold>child elements</bold> are passed through.</codeblock>
I need the line feeds replaced with character entities because InDesign sees any new line that does not start with an element as a column break. My goal was to simply replace the line feed character with 
 with the following template:
<xsl:template match="codeblock//text()">
<xsl:analyze-string select="." regex="(
)">
<xsl:matching-substring>
<xsl:choose>
<xsl:when test="regex-group(1)">
</xsl:when>
</xsl:choose>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:template>
But what I get is:
<codeblock>
<bold/>
</codeblock>
I was finally able to pass the text through using this template:
<xsl:template match="codeblock//text()">
<xsl:copy/>
</xsl:template>
Success! Incidentally, I have to match at any level under codeblock so it includes the text of the child parmname element too. Since I was able to successfully pass it through with <xsl:copy>, I tried this to pass the text through while replacing the line feed at the same time:
<xsl:template match="codeblock//text()">
<xsl:copy>
<xsl:analyze-string select="." regex="(
)">
<xsl:matching-substring>
<xsl:choose>
<xsl:when test="regex-group(1)">
</xsl:when>
</xsl:choose>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:copy>
</xsl:template>
But now it won't replace the new line feed. Instead, I get this (which is what I would expect to get without any template defined):
<codeblock>This is the first line of my code block.
This is my second line to prove that line feeds are preserved.
This line proves that <bold>child elements</bold> are passed through.</codeblock>
I know this is a long and somewhat convoluted question. I just feel like if I could resolve the issue of why it's not passing the text through in the first place, the rest would be fairly straightforward. And I'm sorry, I can't provide my source XML or XSL as it's under NDA, but if you need more, let me know and I'll try to provide it. (My XSL stylesheets are made up of 12 different files, so there's no way for me to provide all of it, even if genericized.)
Any suggestions for what I might look for in my stylesheet that might explain why the text is coming through or any suggestions for how to force it through as I did with <xsl:copy> while still replacing the line feeds will be greatly appreciated!
Edited to add: It has occurred to me that the reason it's not doing the replacement is that it looks like it's not actually a line feed character. It's more like a new line in the code than a line feed character (or hard return) in the text. I think I might need to normalize the text while inserting the 
 character at the end of each line. Still investigating, but suggestions are welcome!
Edited with update: Thanks to the post How to detect line breaks in XSLT, I have gotten closer, but still not quite where I need to be. With this code, I'm able to detect line feeds in the XML and insert the line break character for InDesign:
<xsl:template match="codeblock//text()">
<xsl:for-each select="tokenize(., '\n?')[.]">
<xsl:sequence select="."/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
However, it also inserts the line feed character at the end of the string, even if it's not the end of the line. For instance, I now get:
<codeblock>This is the first line of my code block.
This is my second line to prove that line feeds are preserved.
This line proves that 
<bold>child elements
</bold> are passed through.
</codeblock>
I don't want the line feed character in front of the 'bold' start and end tags or the codeblock end tag. I just want it to appear where there's an actual new line. I tried replacing \r but that just ignored the new lines and just put it in front of the tags. Does anyone know of another escape character that would work here?
A very long question - yet it's still not clear what exactly you are asking (and no reproducible example, either).
If - as it seems - you want to replace newline characters with the line separator character in all text nodes under the codeblock element, you should be able to do simply:
<xsl:template match="codeblock//text()">
<xsl:value-of select="translate(., '
', '
')" />
</xsl:template>
If this doesn't work, then either you have an overriding template or the text does not contain newline characters. You can test for the first case by changing the template to say:
<xsl:template match="codeblock//text()">BINGO</xsl:template>
and observe the result to see if all targeted text nodes are changed to "BINGO". To test for the second case, you can analyze the text character-by-character using the string-to-codepoints() function.
Your template is missing xsl:non-matching-substring to process the non-matching sections of the text node.
<xsl:template match="codeblock//text()">
<xsl:analyze-string select="." regex="\n">
<xsl:matching-substring>
<xsl:text>
</xsl:text>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
However, michael.hor257k's answer is more simple, as you don't need xsl:analyze-string to just replace a all substrings.

XSLT REGEX pattern match

Using Saxon 9.7, XSLT 3.0, I'm trying to select square bracketed terms from a string of text and then remove duplicate values of the terms.
So far I have found a template which selects the substrings I want and a function that tokenizes the string and then removes duplicate values.
However, I haven't been able to get the correct regex for the tokenizing of the string.
Here is my XML of the full text
<column>
<columnDerivationPrompt>Option 1: (No visit windowing)</columnDerivationPrompt>
<columnDerivationDescription>Set to collected visit name [EG.VISIT] Set to 'POST-BASELINE MINIMUM' for the new observation generated for derviation type minimum [ADEG.DTYPE] = 'MINIMUM'
Set to 'POST-BASELINE MAXIMUM' for the new observation generated for derviation type maximum [ADEG.DTYPE]= 'MAXIMUM'
</columnDerivationDescription>
<columnDerivationPrompt>Option 2: (User defined visit windows)</columnDerivationPrompt>
<columnDerivationDescription>Set to a re-defined visit range based on user-defined input, using formatting of Analysis Relative Day [ADEG.ADY] range in conjunction with Analysis Window Target [ADEG.AWTARGET] and Analysis Window Diff from Target [ADEG.AWTDIFF] to determine analysis visit.
Set to 'POST-BASELINE MINIMUM' for the new observation generated for derviation type minimum [ADEG.DTYPE] = 'MINIMUM'
Set to 'POST-BASELINE MAXIMUM' for the new observation generated for derviation type maximum [ADEG.DTYPE]= 'MAXIMUM'
</columnDerivationDescription>
</column>
The string of terms taken from the text that I need to remove duplicates from
EG.VISIT ADEG.DTYPE ADEG.DTYPE ADEG.ADY ADEG.AWTARGET ADEG.AWTDIFF ADEG.DTYPE ADEG.DTYPE
What I would like to see
EG.VISIT ADEG.DTYPE ADEG.ADY ADEG.AWTARGET ADEG.AWTDIFF
my XSLT template and function
<xsl:variable name="test">
<xsl:if test="contains($string,'[')">
<xsl:variable name="relevant-part" select="substring-before(substring-after($string,'['),']')"/>
<xsl:variable name="remainder" select="substring-after($string,']')"/>
<xsl:value-of select="$relevant-part"/>
<xsl:if test="contains($remainder,'[')">
<xsl:text disable-output-escaping="yes"> </xsl:text>
</xsl:if>
<xsl:call-template name="find-relevant-text">
<xsl:with-param name="string" select="$remainder"/>
</xsl:call-template>
</xsl:if>
</xsl:variable>
<xsl:value-of select="myfn:sortCSV($test)"/>
</xsl:template>
<xsl:function name="myfn:sortCSV" as="xs:string*">
<xsl:param name="csvString" as="xs:string"/>
<!-- Split up string and remove duplicates -->
<xsl:variable name="values" select="distinct-values(tokenize($csvString,'\W+\.\W+'))" as="xs:string*"/>
<!-- Return all elements, sorted -->
<xsl:for-each select="$values">
<xsl:sort/>
<!-- We don't return empty strings -->
<xsl:sequence select=".[.!='']"/>
</xsl:for-each>
</xsl:function>
\W+\.\W+ is the regex I have been using to identify e.g. EG.VISIT or ADEG.DTYPE. So any pattern including CC.CCCC to CCCC.CCCCCCCC (where C is a char [A-Z]).
The output I am getting is
EG.VISIT ADEG.DTYPE ADEG.DTYPE ADEG.ADY ADEG.AWTARGET ADEG.AWTDIFF ADEG.DTYPE ADEG.DTYPE
So no duplicates have been removed.
QUESTION:
Can anyone see where I am going wrong with my expression or code?
As for your regular expression, note that a \W matches a non-word char and cannot match uppercase (nor lowercase) letters. \w matches a word char.
However, best is to restrict it to [A-Z]+\.[A-Z]+ since you say the items you want to match follow the uppercase+.+uppercase pattern.
See the regex demo
I would use analyze-string, either with XSLT 2.0 the XSLT xsl:anyalyze-string or with XSLT 3.0 the function of the same name, using that approach it is a one-liner:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
exclude-result-prefixes="xs math fn"
version="3.0">
<xsl:template match="column">
<xsl:value-of select="distinct-values(analyze-string(., '\[([A-Z]+\.[A-Z]+)\]')//fn:match/fn:group[#nr = 1])"/>
</xsl:template>
</xsl:stylesheet>
Output is EG.VISIT ADEG.DTYPE ADEG.ADY ADEG.AWTARGET ADEG.AWTDIFF.
If you want to sort the extracted strings then use <xsl:value-of select="sort(distinct-values(analyze-string(., '\[([A-Z]+\.[A-Z]+)\]')//fn:match/fn:group[#nr = 1]))"/>.

XSLT - Check if pattern exists in an element string

I have the following element as part of a larger XML
<MT N="NonEnglishAbstract" V="[DE] Deutsch Abstract text [FR] French Abstract text"/>
I need to do some formatting of the value in #V attribute, only if it contains anything like [DE], [FR] or any two capital letters representing a country code within square brackets.
If no such pattern exist, I need to simply write the value of #V without any formatting.
I can use an XSLT 2.0 solution
I was hoping that I could use the matches() function something like
<xsl:choose>
<xsl:when test="matches(#V,'\[([A-Z]{{2}})\]([^\[]+'">
//Do something
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="#V"/>
</xsl:otherwise>
</xsl:choose>
I think all you need is:
matches(#V,'\[[A-Z][A-Z]\]')
You don't have to match the entire string to get a true() ... I tell my students to write as short a reg-ex as possible.
You have not posted anything about what you have tried. How about looking up translate function and translating the strings capital letters to something like "X". Then test that string result for the existence of [XX]. That alone would tell you whether you need to process it.
<xsl:variable name="result">
<xsl:value-of select="translate(#V,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','XXXXXXXXXXXXXXXXXXXXXXXXX')"/>
</xsl:variable>
Then use that result and then test:
contains($result, "[XX]")
No regex required, pure XSL 1.1

XSLT, how to use "xsl:value-of" inside "xsl:if"?

I am new to xslt and I am wondering whether it is possible to compare the value of "#userNameKey" and the value of
<xsl:value-of select="./text()"/> in example below?
<xsl:if test="#userNameKey='??????'">
<xsl:attribute name="selected">true</xsl:attribute>
</xsl:if>
<xsl:value-of select="./text()"/>
Basically, I just want to replace the questionmarks with the following fragment: <xsl:value-of select="./text()"/> but there is an issue with the double quotes. Should I use escape characters (if yes, what are they?) or there is a better solution?
If you specifically want to compare against the value of the first text node child of the current element (which is what <xsl:value-of select="./text()"/> gives you), then use
<xsl:if test="#userNameKey=string(text())">
At first sight
<xsl:if test="#userNameKey=text()">
may seem more obvious, but this is subtly different, returning true if the userNameKey matches any one of the text node children of the current element (not necessarily the first one).
But if (as I suspect you really mean) you want to compare the userNameKey against the complete string value of the element even if that consists of more than one text node, then use
<xsl:if test="#userNameKey=.">
Remember that text() is a node set containing all the text node children of the context node, and if you're not sure you need to use it (e.g. when you want to process each separate text node individually) then you probably don't.
You should be able to do just this...
<xsl:if test="#userNameKey=./text()">
<xsl:attribute name="selected">true</xsl:attribute>
</xsl:if>
In fact, the ./ is not needed here, so you can just do this
<xsl:if test="#userNameKey=text()">
<xsl:attribute name="selected">true</xsl:attribute>
</xsl:if>

Why isn't local-name() returning anything?

I'm trying to run the following template:
<xsl:template match="*[starts-with(., 'ATTITUDE_')]/text()">
<xsl:variable name="ElementName" select="local-name()"/>
<xsl:variable name="vVal" select= "$vAttitudes[. = substring-after(current(), '_')]/#val"/>
<xsl:choose>
<xsl:when test="contains($ElementName, 'Refuse')">
<xsl:value-of select="civf:book-capitalise($vAttitudes[#val = $vVal+1])"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="civf:book-capitalise($vAttitudes[#val = $vVal])"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
So the premise is, find the name of the element, if it has the text "Refuse" in the name of it then "doTheThing"+1 otherwise just "doTheThing". However this test always fails so +1 never gets called even if the element has "Refuse" in the name. If I just output local-name then I get empty too. Why does local-name() not appear to work here?
I did previously try to start the template with:
<xsl:template match="*[contains(., 'Refuse')]/name()">
But Saxon complained that I was running too many functions in the match sequence.
I apologise in advance for not knowing too much about XSLT.
I believe that local-name() does not work because you are matching text nodes (/text() in the match attribute), and text nodes do not have local names.
I'm not sure what you are trying to do but I don't think you actually want to match /text() but instead the whole element, and obtain its text() afterwards.
Alternatively, you could try using local-name(..) to get the name of parent node but I'm not sure about that.