Converting bespoke formatting into html formatting using XSLT - xslt

I have some data where formatting commands have been created in the format
^B Makes the rest of the line bold
^I Makes the rest of the line italic
etc., and I'm trying to turn this in to html <b>, <i> etc.
These codes can be combined and occur anywhere in the line, but apply to the rest of the line only.
I've tokenised the data into lines, and am using analyze-string on each line to pick up the formmatting marks. The problem is that I need to open the formatting instruction where I find it in the string, but then close it at the end of the string, and what I have doesn't work, as it opens and closes the format where the marker is, as you would expect:
<xsl:analyze-string select="." regex="\^([BIU])">
<xsl:matching-substring>
<xsl:element name="{lower-case(regex-group(1))}"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="." />
</xsl:non-matching-substring>
</xsl:analyze-string>
What I get from this is:
<b></b> Makes the rest of the line bold
<i></i> Makes the rest of the line italic
where what I want is obviously
<b> Makes the rest of the line bold </b>
<i> Makes the rest of the line italic </i>
I can't see an obvous way of using analyze-string to achieve this, and the only way I can see of doing it is to use a recursive function to process do substring-afters etc., which seems rather messy.
Anyone with a better idea? Thanks!
Screwtape.

You just need add another pattern to your regex expression to capture the rest of the line after the symbol, which can then be output inside your newly created element
Try this
<xsl:analyze-string select="." regex="\^([BIU])(.*)">
<xsl:matching-substring>
<xsl:element name="{lower-case(regex-group(1))}">
<xsl:value-of select="regex-group(2)" />
</xsl:element>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="." />
</xsl:non-matching-substring>
</xsl:analyze-string>

Related

Select first space in a String

I want to select first space of a string in a text.
Input :
<p>Text 1<italic>should</italic> Text 2.</p>
There is a space after </italic>. I want to select only that space and replace a <s> for that space. How can i do that.
Tried code :
<xsl:template match="p/text()[2]">
<s/>
</xsl:template>
Expected results :
<p type="body">Text 1
<style type="underline">should</style><s/>have surgery.</p>
This tried code not works properly. I am using xslt 2.0
In XSLT 2.0, you could do:
<xsl:template match="p/text()[preceding-sibling::*[1][self::italic]]">
<xsl:analyze-string select="." regex="^\s" >
<xsl:matching-substring>
<s/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="." />
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
This matches the text node immediately following an italic element, and checks if the first character is a space. If it is, it will be replaced by an empty s element.
Alternatively, you could restrict the match pattern itself to include only nodes that start with a space:
<xsl:template match="p/text()[preceding-sibling::*[1][self::italic]][starts-with(., ' ')]">
<s/>
<xsl:value-of select="substring(., 2)" />
</xsl:template>

Extract the sub string based on the regular expression in xslt

I have scenario where I want to extract the sub string which matches the regular expression.
Below is the example:
<xsl:value-of select="matches('Process java(Application=JavaApplication_2) is not running in the system.', ''.*AppName=Archiver_[0-9]{1,2}.*'')"/>
But this gives me the boolean value as 'false'.
I tried with tokenize but it is becoming more complex.
Please help me on this.
See instruction analyze-string Source + Regex examples
Input
<root>Process java(Application=JavaApplication_2) is not running in the system.</root>
Template
<xsl:template match="root">
<xsl:analyze-string select="." regex="Application=JavaApplication_[0-9]{{1,2}}">
<xsl:matching-substring>
<xsl:value-of select="."/>
</xsl:matching-substring>
<!-- optional -->
<xsl:non-matching-substring>
<!-- do sth -->
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
The matches() function returns either true or false.
To extract a matching substring, try using the replace() function instead. I am not sure which substring you are trying to extract, so I will not give an example here, but see: https://stackoverflow.com/a/39402132/3016153

How can I replace text with angle bracket without parsing the replace value?

I have this:
replace("Both cruciate ligaments are well visualized and are intact.",
".",
".<br>")
But I do not want to output the escaped angle brackets but the actual brackets. when I run the code I get :
Both cruciate ligaments are well visualized and are intact.<br>
I want:
Both cruciate ligaments are well visualized and are intact.<br>
How can I achieve that? I cannot use the angle bracket directly as replace value since I get an error.
EDIT
I have a stylesheet that takes in a text file that is injected into a HTML file (coming from the stylesheet). I take an XML (Clinical document) and a text file and merge them together with the stylesheet. So for example I have:
RADIOLOGY REPORT
NAME: JOHN, DOE
DoB: 1982-02-25
Injected text goes here
The text has to wrap on carriage return and has to wrap at a word level. I did manage to do the latter but I did not find a way to the line breaks. I thought of finding 'LF' in the file an replace with <BR> so that once the page is rendered I get to see the line breaks.
You need to use xsl:analyze-string if you want to output nodes and not simply strings. Here is an example:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:template match="text">
<xsl:analyze-string select="." regex="\.">
<xsl:matching-substring>
<xsl:value-of select="."/><br/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
With the input being
<text>Both cruciate ligaments are well visualized and are intact.</text>
the transformation result is
Both cruciate ligaments are well visualized and are intact.<br>
Martin Honnen's answer is a perfectly good way to do this.
Using a simple template to find the text in question is another way:
<xsl:variable name="magic-string"
select='"Both cruciate ligaments are well visualized and are intact."'/>
...
<xsl:template match="text()
[contains(.,$magic-string)]">
<xsl:value-of select="substring-before(.,$magic-string)"/>
<xsl:value-of select="$magic-string"/>
<br/>
<xsl:value-of select="substring-after(.,$magic-string)"/>
</xsl:template>
In either case, use the HTML output method to serialize the empty br element as <br> instead of as <br/>.
Note: I'm assuming here that you want a br after this particular sentence, not that you want one after each occurrence of full stop, which is how Martin Honnen appears to have interpreted the question.

Inserting a line break in a PDF generated from XSL FO using <xsl:value-of>

I am using XSL FO to generate a PDF file containing a table with information. One of these columns is a "Description" column. An example of a string that I am populating one of these Description fields with is as follows:
This is an example Description.<br/>List item 1<br/>List item 2<br/>List item 3<br/>List item 4
Inside the table cell that corresponds to this Description, I would like the output to display as such:
This is an example Description.
List item 1
List item 2
List item 3
List item 4
I've learned from searching elsewhere that you can make line breaks in XSL FO using an <fo:block></fo:block> within another <fo:block> element. Therefore, even before I parse the XML with my XSL stylesheet, I replace all occurrences of <br/> with <fo:block/>, so that the literal value of the string now looks like:
This is an example Description.<fo:block/>List item 1<fo:block/>List item 2<fo:block/>List item 3<fo:block/>List item 4
The problem arises when the Description string I am using is obtained using <xsl:value-of>, example as follows:
<fo:block>
<xsl:value-of select="descriptionStr"/>
</fo:block>
In which case, the value that gets output to my PDF document is the literal value, so it looks exactly like the previous example with all the <fo:block/> literals. I've tried manually hard-coding the <fo:block/> in the middle of another string, and it displays correctly. E.g. if I write inside my stylesheet:
<fo:block>Te<fo:block/>st</fo:block>
It will display correctly as:
Te
st
But this does not seem to happen when the <fo:block/> is inside the value of an <xsl:value-of select=""/> statement. I've tried searching for this on SO as well as Google, etc. to no avail. Any advice or help will be greatly appreciated. Thank you!
You could also replace <br/> with
and add a linefeed-treatment="preserve" attribute to your <fo:block>.
Something like:
<fo:block linefeed-treatment="preserve">This is an example Description.
List item 1
List item 2
List item 3
List item 4</fo:block>
Edit
Some users may need to use \n instead of
depending on how they are creating the XML. See Retain the
during xml marshalling for more details.
This helped me and should be simplest solution (working with Apache FOP 1.1):
Why not replace your <br/> with Unicode character called line separator.
<xsl:template match="br">
<xsl:value-of select="'
'"/>
</xsl:template>
See https://en.wikipedia.org/wiki/Newline#Unicode
The following code worked:
<fo:block white-space-collapse="false"
white-space-treatment="preserve"
font-size="0pt" line-height="15px">.</fo:block>
It makes the xsl processor thinks this block contains a line of text, which actually has a 0pt font size.
You can customize line height by providing your own value.
You shouldn't use xsl:value-of instruction but xsl:apply-templates instead: for built-in rule for text node will just output their string value, and for empty br element you could declare a rule matching descriptionStr/br or descriptionStr//br (depending your input) in order to transform to empty fo:block.
Generating strings containing escaped XML markup is seldom the right answer, but if that's what you have to work with, then for input like this:
<Description><![CDATA[This is an example Description.<br/>List item 1<br/>List item 2<br/>List item 3<br/>List item 4]]></Description>
if you're using XSLT 2.0, you can use xsl:analyze-string to get the empty fo:block that you originally wanted:
<xsl:template match="Description">
<fo:block>
<xsl:analyze-string select="." regex="<br/>">
<xsl:matching-substring>
<fo:block />
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="." />
</xsl:non-matching-substring>
</xsl:analyze-string>
</fo:block>
</xsl:template>
but if you are using XSLT 2.0, you can more concisely use linefeed-treatment="preserve" as per #Daniel Haley and use replace() to insert the linefeeds:
<xsl:template match="Description">
<fo:block linefeed-treatment="preserve">
<xsl:value-of select="replace(., '<br/>', '
')" />
</fo:block>
</xsl:template>
If you are using XSLT 1.0, you can recurse your way through the string:
<xsl:template match="Description">
<fo:block linefeed-treatment="preserve">
<xsl:call-template name="replace-br" />
</fo:block>
</xsl:template>
<xsl:template name="replace-br">
<xsl:param name="text" select="." />
<xsl:choose>
<xsl:when test="not(contains($text, '<br/>'))">
<xsl:value-of select="$text" />
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="substring-before($text, '<br/>')"/>
<xsl:text>
</xsl:text> <!-- or <fo:block /> -->
<xsl:call-template name="replace-br">
<xsl:with-param name="text" select="substring-after($text, '<br/>')"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
Try this:
<fo:block><fo:inline color="transparent">x</fo:inline></fo:block>
This code adds a block which contains transparent text, making it look like a new line.
Try using linefeed-treatment="preserve" and \n instead of <br> for a new line.
<fo:block linefeed-treatment="preserve" >
<xsl:value-of select="Description" />
</fo:block>
For XSLT 1.0 I'm using my XSLT Line-Break Template on GitHub.
For XSL-FO it supports
Line breaks
Line delimiters (vs Line breaks)
Series of pointers in a row
Ignore Pointer Repetitions (disable the Series of pointers in a row)
Any string as a pointer to insert a break or a delimiter ("\n" is default)
Line delimiters' height
Default Line delimiter height from a current font size.
Auto ignoring of the "\r" char when searching a break place.
Added support for XSLT 2.0 for a seamless migration.
something else...
For XSLT 2.0 and later consider to use approaches like
XSLT 2.0 xsl:analyze-string (RegEx)
XPath 2.0 tokenize + XSLT (RegEx)
passing sequences as a template parameter (XSLT 2.0)
and so on
I usually use an empty block with a height that can be changed if I need more or less space:
<fo:block padding-top="5mm" />
I know this isn't the best looking solution but it's funtional.
I had a text block that looks like this
<fo:table-cell display-align="right">
<fo:block font-size="40pt" text-align="right">
<xsl:text> Text 1 </xsl:text>
<fo:block> </fo:block>
<xsl:text> Text2 </xsl:text>
<fo:block> </fo:block>
<xsl:text> Text 3</xsl:text>
</fo:block>
NB: note the empty
</fo:block> on it's own is not a direct substitute for <br/> <br/> is an html unpaired abberation that has no direct equivalent in xsl:fo
</fo:block> just means end of block. If you scatter them through your text you wont have valid xml, and your xsl processor will sick up errors.
For the line break formatting you want, each block will occur on a new line. You need a <fo:block> start block and </fo:block> end block pair for each line.

How to implement Carriage return in XSLT

I want to implement carriage return within xslt.
The problem is I have a varible:
Step 1 = Value 1 breaktag Step 2 = Value 2 as a string and would like to appear as
Step 1 = Value 1
Step 2 = Value 2
in the HTML form but I am getting the br tag on the page.Any good ways of implementing a line feed/carriage return in xsl would be appreciated
As an alternative to
<xsl:text>
</xsl:text>
you could use
<xsl:text>
</xsl:text> <!-- newline character -->
or
<xsl:text>
</xsl:text> <!-- carriage return character -->
in case you don't want to mess up your indentation
This works for me, as carriage-return + life feed.
<xsl:text>
</xsl:text>
The "
" string does not work.
The cleanest way I've found is to insert !ENTITY declarations at the top of the stylesheet for newline, tab, and other common text constructs. When having to insert a slew of formatting elements into your output this makes the transform sheet look much cleaner.
For example:
<?xml version="1.0"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY nl "<xsl:text>
</xsl:text>">
]>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="step">
&nl;&nl;
<xsl:apply-templates />
</xsl:template>
...
</xsl:stylesheet>
use a simple carriage return in a xsl:text element
<xsl:text>
</xsl:text>
Try this at the end of the line where you want the carriage return. It worked for me.
<xsl:text><![CDATA[<br />]]></xsl:text>
I was looking for a nice solution to this, as many would prefer, without embedding escape sequences directly in the expressions, or having weird line breaks inside of a variable. I found a hybrid of both this approaches actually works, by embedding a text node inside a variable like this:
<xsl:variable name="newline"><xsl:text>
</xsl:text></xsl:variable>
<xsl:value select="concat(some_element, $newline)" />
Another nice side-affect of this is that you can pass in whatever newline you want, be it just LF, CR, or both CRLF.
--Daniel
Here is an approach that uses a recursive template, which looks for
in the string from the database and then outputs the substring before.
If there is a substring after
remaining, then the template calls itself until there is nothing left.
In case
is not present then the text is simply output.
Here is the template call (just replace #ActivityExtDescription with your database field):
<xsl:call-template name="MultilineTextOutput">
<xsl:with-param name="text" select="#ActivityExtDescription" />
</xsl:call-template>
and here is the code for the template itself:
<xsl:template name="MultilineTextOutput">
<xsl:param name="text"/>
<xsl:choose>
<xsl:when test="contains($text, '
')">
<xsl:variable name="text-before-first-break">
<xsl:value-of select="substring-before($text, '
')" />
</xsl:variable>
<xsl:variable name="text-after-first-break">
<xsl:value-of select="substring-after($text, '
')" />
</xsl:variable>
<xsl:if test="not($text-before-first-break = '')">
<xsl:value-of select="$text-before-first-break" /><br />
</xsl:if>
<xsl:if test="not($text-after-first-break = '')">
<xsl:call-template name="MultilineTextOutput">
<xsl:with-param name="text" select="$text-after-first-break" />
</xsl:call-template>
</xsl:if>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text" /><br />
</xsl:otherwise>
</xsl:choose>
Works like a charm!!!
I believe that you can use the xsl:text tag for this, as in
<xsl:text>
</xsl:text>
Chances are that by putting the closing tag on a line of its own, the newline is part of the literal text and outputted as such.
I separated the values by Environment.NewLine and then used a pre tag in html to emulate the effect I was looking for
This is the only solution that worked for me. Except I was replacing
with \r\n