I'm trying to filter and then replace umlauts in a file name. Unfortunately this doesn't really work, because if all three umlauts are included, the variable name is also written several times in the filename.
<xsl:choose>
<xsl:when test="contains($name,'ä') or contains($name,'ö') or contains($name,'ü')">
<xsl:value-of select="replace($name, 'ä', 'ae'),replace($name,'ö','oe'),replace($name, 'ü', 'ue') " />
</xsl:when>
No matter what I try, either it just replaces an umlaut, or I have multiple times the filename after the transformation.
When I try to create a nested replace where the variable occurs only once, the file cannot be saved without errors....
Does anyone of you have an idea how I have only once the name in the filename, but all umlaut are replaced?
Try:
<xsl:value-of select="replace(replace(replace($name, 'ä', 'ae'), 'ö', 'oe'), 'ü', 'ue')"/>
Alternatively, you could try something more generic, e.g.
<xsl:value-of select="replace(normalize-unicode($name, 'NFD'), '([aou])̈', '$1e')"/>
but here you need to carefully evaluate the possible effect on other diacritics that the input may contain.
Related
I'm transforming a DITA document to a simplified, formatting-based XML to be used as an import into Adobe InDesign. My transformation is going really well, except for one element which omits the text in the output. The element is codeblock. When I don't have a template specifying it at all, the element and any child elements are passed through to the new XML document, but none of the text is passed through. This element should be passed through with text and child elements like every other element in my document for which a specific template is not defined. There's nothing anywhere else in the XSL stylesheet that specifies codeblock or any of its attributes. I am completely stumped and cannot figure out what's going on here.
It is also worth noting that a number of inline elements (cmdname, parmname, userinput, etc.) are converted to bold on output. The downstream XML is for formatting and does not need to know semantic context.
This is what I'm trying to pass through:
<codeblock>This is the first line of my code block.
This is my second line to prove that line feeds are preserved.
This line proves that <parmname>child elements</parmname> are passed through.</codeblock>
With no template defined for codeblock, this is what I get as a result:
<codeblock><bold/></codeblock>
The actual result I want is:
<codeblock>This is the first line of my code block.
This is my second line to prove that line feeds are preserved.
This line proves that <bold>child elements</bold> are passed through.</codeblock>
I need the line feeds replaced with character entities because InDesign sees any new line that does not start with an element as a column break. My goal was to simply replace the line feed character with
with the following template:
<xsl:template match="codeblock//text()">
<xsl:analyze-string select="." regex="(
)">
<xsl:matching-substring>
<xsl:choose>
<xsl:when test="regex-group(1)">
</xsl:when>
</xsl:choose>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:template>
But what I get is:
<codeblock>
<bold/>
</codeblock>
I was finally able to pass the text through using this template:
<xsl:template match="codeblock//text()">
<xsl:copy/>
</xsl:template>
Success! Incidentally, I have to match at any level under codeblock so it includes the text of the child parmname element too. Since I was able to successfully pass it through with <xsl:copy>, I tried this to pass the text through while replacing the line feed at the same time:
<xsl:template match="codeblock//text()">
<xsl:copy>
<xsl:analyze-string select="." regex="(
)">
<xsl:matching-substring>
<xsl:choose>
<xsl:when test="regex-group(1)">
</xsl:when>
</xsl:choose>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:copy>
</xsl:template>
But now it won't replace the new line feed. Instead, I get this (which is what I would expect to get without any template defined):
<codeblock>This is the first line of my code block.
This is my second line to prove that line feeds are preserved.
This line proves that <bold>child elements</bold> are passed through.</codeblock>
I know this is a long and somewhat convoluted question. I just feel like if I could resolve the issue of why it's not passing the text through in the first place, the rest would be fairly straightforward. And I'm sorry, I can't provide my source XML or XSL as it's under NDA, but if you need more, let me know and I'll try to provide it. (My XSL stylesheets are made up of 12 different files, so there's no way for me to provide all of it, even if genericized.)
Any suggestions for what I might look for in my stylesheet that might explain why the text is coming through or any suggestions for how to force it through as I did with <xsl:copy> while still replacing the line feeds will be greatly appreciated!
Edited to add: It has occurred to me that the reason it's not doing the replacement is that it looks like it's not actually a line feed character. It's more like a new line in the code than a line feed character (or hard return) in the text. I think I might need to normalize the text while inserting the
character at the end of each line. Still investigating, but suggestions are welcome!
Edited with update: Thanks to the post How to detect line breaks in XSLT, I have gotten closer, but still not quite where I need to be. With this code, I'm able to detect line feeds in the XML and insert the line break character for InDesign:
<xsl:template match="codeblock//text()">
<xsl:for-each select="tokenize(., '\n?')[.]">
<xsl:sequence select="."/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
However, it also inserts the line feed character at the end of the string, even if it's not the end of the line. For instance, I now get:
<codeblock>This is the first line of my code block.
This is my second line to prove that line feeds are preserved.
This line proves that
<bold>child elements
</bold> are passed through.
</codeblock>
I don't want the line feed character in front of the 'bold' start and end tags or the codeblock end tag. I just want it to appear where there's an actual new line. I tried replacing \r but that just ignored the new lines and just put it in front of the tags. Does anyone know of another escape character that would work here?
A very long question - yet it's still not clear what exactly you are asking (and no reproducible example, either).
If - as it seems - you want to replace newline characters with the line separator character in all text nodes under the codeblock element, you should be able to do simply:
<xsl:template match="codeblock//text()">
<xsl:value-of select="translate(., '
', '
')" />
</xsl:template>
If this doesn't work, then either you have an overriding template or the text does not contain newline characters. You can test for the first case by changing the template to say:
<xsl:template match="codeblock//text()">BINGO</xsl:template>
and observe the result to see if all targeted text nodes are changed to "BINGO". To test for the second case, you can analyze the text character-by-character using the string-to-codepoints() function.
Your template is missing xsl:non-matching-substring to process the non-matching sections of the text node.
<xsl:template match="codeblock//text()">
<xsl:analyze-string select="." regex="\n">
<xsl:matching-substring>
<xsl:text>
</xsl:text>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
However, michael.hor257k's answer is more simple, as you don't need xsl:analyze-string to just replace a all substrings.
I have the following element as part of a larger XML
<MT N="NonEnglishAbstract" V="[DE] Deutsch Abstract text [FR] French Abstract text"/>
I need to do some formatting of the value in #V attribute, only if it contains anything like [DE], [FR] or any two capital letters representing a country code within square brackets.
If no such pattern exist, I need to simply write the value of #V without any formatting.
I can use an XSLT 2.0 solution
I was hoping that I could use the matches() function something like
<xsl:choose>
<xsl:when test="matches(#V,'\[([A-Z]{{2}})\]([^\[]+'">
//Do something
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="#V"/>
</xsl:otherwise>
</xsl:choose>
I think all you need is:
matches(#V,'\[[A-Z][A-Z]\]')
You don't have to match the entire string to get a true() ... I tell my students to write as short a reg-ex as possible.
You have not posted anything about what you have tried. How about looking up translate function and translating the strings capital letters to something like "X". Then test that string result for the existence of [XX]. That alone would tell you whether you need to process it.
<xsl:variable name="result">
<xsl:value-of select="translate(#V,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','XXXXXXXXXXXXXXXXXXXXXXXXX')"/>
</xsl:variable>
Then use that result and then test:
contains($result, "[XX]")
No regex required, pure XSL 1.1
I'm converting DITA maps to PDF using the DITA Open Toolkit 1.7 and RenderX XEP. In the DITA topics, product names are inserted using conrefs. One of my product names is quite long. It caused layout problems when used within tables. Therefore I inserted a soft hyphen into the phrase that is reused via conref:
<ph id="PD_FineReader2Comp">DOXiS4 FineReader2Components</ph>
This works nicely in the generated pages, but creates a problem in the bookmarks where a symbol is displayed in place of the soft hyphen.
Obviously, this is an encoding problem. It seems that UTF-8 characters are properly handled in PDF content, but not in PDF bookmarks where, according to the following sources, some PDF-16 characters can be used (but I did not understand which ones).
http://partners.adobe.com/public/developer/en/pdf/PDFReference.pdf
http://www.setasign.de/support/tips-and-tricks/use-unicode-in-string-values/
The DITA Open Toolkit seems to create bookmarks from topic titles using this code fragment:
<fo:bookmark>
<xsl:attribute name="internal-destination">
<xsl:call-template name="generate-toc-id"/>
</xsl:attribute>
<xsl:if test="$bookmarkStyle!='EXPANDED'">
<xsl:attribute name="starting-state">hide</xsl:attribute>
</xsl:if>
<fo:bookmark-title>
<xsl:value-of select="normalize-space($topicTitle)"/>
</fo:bookmark-title>
<xsl:apply-templates mode="bookmark"/>
</fo:bookmark>
The XSL stylesheet has version 2.0.
I would like to create an override that removes the offending character. How can I do this?
Is it possible to properly resolve the encoding problem? (Probably not possible).
Are there any XSL functions or attributes which remove whitespace other than space, tab, linefeed, and carriage return?
Or do I need special handling for the soft hyphen?
Small refinement: If you are using XSLT2, will be more efficient than in this context. In XSLT2 you should always prefer xsl:sequence over xsl:value-of
The simple way to do this is to use the translate() function, which can be used to replace certain characters with other characters, or with nothing. It looks like this is the line that outputs the value you want to fix up:
<xsl:value-of select="normalize-space($topicTitle)"/>
So you could simply modify this to:
<xsl:value-of select="translate(normalize-space($topicTitle), '', '')"/>
to remove all the soft hyphens. If you would like to replace them with spaces or ordinary hyphens, you could do either of the following, respectively:
<xsl:value-of select="translate(normalize-space($topicTitle), '', ' ')"/>
<xsl:value-of select="translate(normalize-space($topicTitle), '', '-')"/>
I have a question!
I have an XML document that has sections and subsections. I am generating a Doxygen page out of it using XSLTProc and now I have a problem. When I generate a section name like this:
<xsl:template match="SECTION/SUBSECTION">
#subsection <xsl:value-of select="#title"/>
<xsl:apply-templates/>
</xsl:template>
Then the first word of the title does not show up, because Doxygen expects the declaration in this way:
#subsection <subsectionname> <subsectiontitle>
So, the first word is automatically treated as the subsection name. Putting a randomly generated string there does not seem like a very simple task. I tried to put unique number instead, by using <xsl:value-of select="count(preceding-sibling::*[#col]) + 1", which worked as expected, but as it turns out, Doxygen does not accept numbers as subsection names. I also tried to strip white spaces of "#title" and use that as the subsection name, but XSLTProc complains that it was not an immediate child of <xslt:stylesheet>. How can I easily put some unique string there? It does not have to be meaningful text.
Thanks in advance!
Use the generate-id() function.
<xsl:value-of select="generate-id(#title)"/>
If you want the generated string to be more "readable", here is one way to do this:
<xsl:value-of select="concat(#title, generate-id(#title))"/>
The relevant parts of the code:
<xsl:variable name="apos">'</xsl:variable>
<xsl:variable name="and" select='"'"' />
<xsl:value-of select="translate(products_name/node(),$and,$apos)"/>
I'm thinking this should be a simple thing and that the above code should work but it doesn't effect the output at all.
(I used variables because names cannot begin within an ampersand and using just an apostrophe brings up a compile error.)
I've tested the code to make sure the translate is working using strings and there are no errors there.
Any help would be greatly appreciated.
You are on the right track, but not yet there: Your problem is, that XSL is a language that itself is written using XML. For all XML languages, the parser automatically decodes XML entities. The XSLT engine only comes afterwards.
As a result, the XSLT engine neither does nor can distinguish whether you wrote ' or ' - it's the same. For your problem, this has two effects:
You have to use a variable containing the apostrope - this is because the apostrophe itself is reserved for string literals in expressions that may contain functions. Even for <xsl:value-of select="translate(products_name/node(),$and,''')"/>, the XML parser transforms the entity into an apostrophe, i.e. <xsl:value-of select="translate(products_name/node(),$and,''')"/>
You have to escape the ampersand used in the string you search for: for the XSL engine, the variable "and" contains the value ', i.e. you are replacing an apostrophe with an apostrophe.
Working solution:
<xsl:variable name="apos">'</xsl:variable>
<xsl:value-of select='translate(text(), "'", $apos)'/>
Technically, there's no difference in any XML between ', ' and ', they're different ways of representing exactly the same thing. Therefore, that translate call shouldn't do anything.
It depends on how you're transforming it, where that output is (attribute value or element?), and how the output is serialized to text, but your problem isn't with your XSLT.