Prevent Narrow Non-Breaking Space (n-nbsp) in XSLT output - xslt

I have an XSLT transform that puts   into my output. That is a narrow-non breaking space. Here is one section that results in nnbsp:
<span>
<xsl:text>§ </xsl:text>
<xsl:value-of select="$firstsection"/>
<xsl:text> to </xsl:text>
<xsl:value-of select="$lastsection"/>
</span>
The nnbsp in this case, comes in after the § and after the text to.
<span>§ 1 to 8</span>
(interestingly, the space before the to turns out to be a regular full size space)
This occurs in my UTF-8 encoded output, as well as iso-8859-1 (latin1).
How can I avoid the nnbsp? While the narrow space is visually more appropriate, it doesn't work for all the devices that will read this document. I need a plain vanilla blank space.
Is there a transform setting? I use Saxon 9 at the command line.
Should I do a another transform.. using a replace template to replace the nnbsp?
Should I re-do my templates like the one above? Example, if I did a concat() would that be a better coding practice?
UPDATE: For those who may find this question someday... as suggested by Michael Kay, I researched the issue further. Indeed, it turns out narrow-NBSP were in the source XML files (and bled into my templates via cut/paste). I did not know this, and it was hard to discover (hat tip to gVim hex view). The narrows don't exactly jump out at you in a GUI editor. I have no control over production of the source XML, so I had to find a way to 'deal with it.' Eric's answer below turned out to be my preferred way to scrub the narrow-nbsp. SED editing was (and is) an another option to consider, but I like keeping my production in XSLT when possible. So Eric's suggestion has worked well for me.

You could use the translate() function to replace your nnbsp by something else, but since you are using Saxon 9 you can rely on XSLT 2.0 features and use a character map which will do that kind of things automatically for you, for instance (assuming that you want to replace them by a non breaking space:
<xsl:output use-character-maps="nnbsp"/>
<xsl:character-map name="nnbsp">
<xsl:output-character character=" " string=" "/>
</xsl:character-map>
Eric

The narrow non-breaking space is coming from somewhere: either the source document or the stylesheet. It's not being magically injected by the XSLT processor. If it's in the stylesheet, then get rid of it. If it's in the source document, then transform it away, for example by use of the translate() function.
In fact, pasting your code fragment into a text editor and looking at it in hex, I see that the 202F characters are right there in your code. I don't know how you got them into your stylesheet, but you should (a) remove them, and (b) work out how it happened so it doesn't happen again.

Related

Duplicate line and replace string

I have an XML file that contains more than 10,000 items. Each item contains a line like this.
<g:id><![CDATA[FBM00101816_BLACK-L]]></g:id>
For each item I need to add another line below like this:
<sku><![CDATA[FBM00101816]]></sku>
So I need to duplicate each g:id line, replace the g:id with sku and trim the value to delete all characters after the underscore (including it). The final result would be like this:
<g:id><![CDATA[FBM00101816_BLACK-L]]></g:id>
<sku><![CDATA[FBM00101816]]></sku>
Any ideas how to accomplish this?
Thanks in advance.
In XSLT, it's
<xsl:template match="g:id">
<xsl:copy-of select="."/>
<sku><xsl:value-of select="substring-before(., '_')"/></sku>
</xsl:template>
Or using Saxon's Gizmo (https://www.saxonica.com/documentation11/index.html#!gizmo) it's
follow //g:id with <sku>{substring-before(., '_')}</sku>
Don't try to do this sort of thing in a text editor (or any other tool that doesn't involve a real XML parser) unless it's a one-off. Your code will be too sensitive to trivial variations in the way the source XML is written and will almost inevitably have bugs - which might not matter for a one-off, but do matter if it's going to be used repeatedly over a period of time.
Note also, the CDATA tags in your input (and output) are a waste of space. CDATA tags have no significance unless the element content includes special characters like < and &, which isn't the case in your examples.
Okay, so after commenting, I couldn't help myself. This seemed to do what you asked for.
find: <g:id><!\[CDATA\[([^\_]+)?(.+)?\]></g:id>
replace: $0\n<sku><![CDATA[$1]></sku>
I don't have BBEdit, but this is what it looked like in Textmate:

UltraEdit/Notepad - XML Remove nodes with empty properties

I'm currently facing an issue with a software i'm working with , this software receives from an external sofware several Xmls that we do need to process , now our issue is that those Xml files contain a lot of nodes which are totally useless and also make the files (xmls) really heavy because of that , in result out program runs very slow to process each one of the xmls , this should be changed in the future and i'd like to prove that by removing those nodes we would improve our processing time a lot , now i'd like as first step to do this manually , using a sample xml and applying a regex syntax to remove all the nodes with value property empty , this is the syntax that i'm using now and through the replace function in notepad i'm able to remove those rows and then remove the empty lines :
<.*(\s\w+?[^=]*?="[^"]*?")*?\s+?value="[""]*?".*?>
Example
<TEST_NODE value="1"/>
<TEST_NODE value=""/>
<TEST_NODE value="0"/>
In my case nodes can be named differently and can have different properties , but the one that i should care for are the ones that contain something in the value property , therefore in this case i should remove the second row
This looks to be working fine , however with very large files (10 mb) the replace notepad++ function seems to have issues and it stop working properly breaking a lot of tags...
I've tried using another software called "Ultraedit" , but there the syntax i guess it's different as i can use regular Expressions but need to select one of those options : Perl , Unix , Ultraedit ; only using "Perl" i'm able to do this replacement but also there , for big files this is not working and i get the following error:
The complexity of matching the expression has exceeded available resources..
Can anyone help me out with this? unfortunately i'm not even that good with Regex and i'm not sure if the above code is good or bad..
Try this:
<(?=[^><]*?value\s*=\s*"")[^><]*>
Replace with nothing.
This might be a case of catastrophic backtracking when the regex runs caused by too many quantifiers applied to too many wide character classes like .
The quantifiers in this answer are only applied to not < or > class which should stop the expression backtracking through XML tags.
You're using the wrong tool for the job. If you're going to be manipulating XML then you need to add XSLT and/or XQuery to your tool kit. Using regular expressions for the job is slow and error-prone.
For example, here are just a few of the bugs in the answer that you accepted:
Elements that use single quotes (value='') won't be matched
Element with whitespace around the equals sign won't be matched
Elements with an attribute whose name ends in value (e.g. xvalue="") will be matched
value="" will be matched inside comment and CDATA nodes
value="" can be matched inside text nodes: <x>value=""</x>
Elements split across multiple lines won't be matched (I suspect)
In XSLT 3.0 this is simply
<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="*[#value='']"/>
</xsl:transform>
Try this regular expression in Notepad++
<[^<]+value=""[^>]*>

How do I prohibit double quotes in an inputText in XPages?

I've been trying to prohibit users from entering double-quotes (") into some fields that are used in JSON strings, as they cause unexpected termination of values in the strings. Unfortunately, while the regex isn't hard to write, I can't get it to work within XPages.
I tried using both double-quotes alone and using the escape character. Both ways fail any string, not just ones including the double-quotes.
<xp:validateConstraint message="Please do not use double quotes in organization/vendor names">
<xp:this.regex><![CDATA['^[^\"]*$]]></xp:this.regex>
</xp:validateConstraint>
There must be a simple way around this issue.
I think you're running into issues with your regex property for your xp:validateConstraint validator. You seem to be attempting to strip the characters in the xp:this.regex as opposed to specifying what characters are allowed, as I believe the docs read. I might recommend checking out the xp:customConverter (bias: I'm more familiar with the customConverter) which gives you the ability to alter the getValueAsObject and getValueAsString methods; then you can escape the undesired characters.
Here's what I'm thinking of, to strip them out. If you plug this into an XPage, you'll find that when the value is pulled (e.g.- by the partial refresh), it converts the input content accordingly by stripping out quotes (both single and double, in my case).
<?xml version="1.0" encoding="UTF-8"?>
<xp:view xmlns:xp="http://www.ibm.com/xsp/core">
<xp:inputTextarea
id="inputTextarea1"
value="#{viewScope.myStuff}"
disableClientSideValidation="true">
<xp:this.converter>
<xp:customConverter>
<xp:this.getAsString><![CDATA[#{javascript:return value.replace(/["']/g, "");}]]></xp:this.getAsString>
<xp:this.getAsObject><![CDATA[#{javascript:return value.replace(/["']/g, "");}]]></xp:this.getAsObject>
</xp:customConverter>
</xp:this.converter>
</xp:inputTextarea>
<xp:button
value="Do Something"
id="button1">
<xp:eventHandler
event="onclick"
submit="true"
refreshMode="partial"
refreshId="computedField1" />
</xp:button>
<xp:text
escape="true"
id="computedField1"
value="#{viewScope.myStuff}" />
</xp:view>
My interaction with the above code yields:
Notice that for it to reflect in the refresh, I'm modifying both the getAsString and the getAsObject, since it's updating the viewScope'd object during the refresh (a fact I had to remind myself of), but saving to a text field in XPages will get the value by the getAsString (provided your data source knows its a String related field, e.g.- NotesXspDocument as document1, with known Form, where the field is a Text field).
As the above comments alluded to, this performs an act of filtering the input values as opposed to escaping or validating those values. You could also change my replace methods to replacing with a text escape character, return value.replace(/"/g,"\"").replace(/'/g,"\'");.
Is the simple answer just add a JavaScript function call on the submit button to remove the quote?
A more elegant solution would be to not allow typing of the quote by checking the keydown event and preventing for that character code. The user should not be able to type one thing and then have it changed on them in processing
#Eric McCormick recommends a customConverter which in my opinion is a neat solution I probably would be going for in many cases. Sometimes however we need to teach users to adhere to the rules so we have to show them where they did wrong. That's when we may need a validator.
Playing around a bit the simplest solution I came up with is a xp:validateExpression simply looking for the first occurrence of a double quote within the String entered:
<xp:inputText
id="inputText1"
value="#{viewScope.testvalue}">
<xp:this.validators>
<xp:validateExpression
message="Hey, wait! Didn't I tell you not to use double quotes in here?">
<xp:this.expression><![CDATA[#{javascript:value.indexOf("\"")==-1}]]></xp:this.expression>
</xp:validateExpression>
</xp:this.validators>
</xp:inputText>
If that's a single occurrence in your application that's it, really. If you need this and similar solutions all over the place you might want to take a look into writing a small validator bean (java), register it via faces-config.xml and then use it everywhere in your application e.g by using an xp:validator instead
As suggested by #Tomalik and #sidyll, this is attempt to solve the wrong problem. While each of the answers supplied do solve the problem of preventing the user from entering undesirable characters, it is better to encode those characters to preserve the user's input. In this particular case, the intermediate step in providing the data to the user via a JSON string is to pull the value from a view.
So, all I had to do was change the column formula to encode the string using the UTF-8 character set and it displays the values with the "undesirable characters". The unencoded value is stored on the document so that Old Notes access won't create confusion.
#URLEncode ("UTF-8"; vendorName )
In one case, the JSON is computed as part of the form design, but the same solution works.

need to display char in xslt

Hi all
I am using xslt 1.0. I have the char code as FOA7 which has to displayed as a corresponding character. My input is
<w:sym w:font="Wingdings" w:char="F0A7"/>
my xslt template is
<xsl:template match="w:sym">
<xsl:variable name="char" select="#w:char"/>
<span font-family="{#w:fonts}">
<xsl:value-of select="concat('&#x',$char,';')"/>
</span>
</xsl:template>
It showing the error as ERROR: 'A decimal representation must immediately follow the "&#" in a character reference.'
Please help me in fixing this..Thanks in advance...
This isn't possible in (reasonable) XSLT. You can work around it.
Your solution with concat is invalid: XSLT is not just a fancy string-concatenator, it really transforms the conceptual tree. An encoded character such as  is a single character - if you were to somehow include the letters & # x f 0 a 7 ; then the XSLT processor would be required to include these letters in the XML data - not the string! So that means it will escape them.
There's no feature in XSLT 1.0 that permits converting from a number to a character with that codepoint.
In XSLT 2.0, as Michael Kay points out, you can use codepoints-to-string() to achieve this.
There are two solutions. Firstly, you could use disable-output-escaping. This is rather nasty and not portable. Avoid this at all costs if you can - but it will probably work in your transformer, and it's probably the only general, simple solution, so you may not be able to avoid this.
The second solution would be to hardcode matches for each individual character. That's a mess generally, but quite possible if you're dealing with a limited set of possibilities - that depends on your specific problem.
Finally, I'd recommend not solving this problem in XSLT - this is typically something you can do in pre/post processing in another programming environment more appropriately. Most likely, you've an in-memory representation of the XML document to be able to use XSLT in the first place, in which case this won't even take much CPU time.
<span font-family="{#w:font}">
<xsl:value-of select="concat('&#x', #w:char, ';')"
disable-output-escaping="yes"/>
</span>
Though check #Eamon Nerbonne's answer, why you shouldn't do it at all.
If you were using XSLT 2.0 (which you aren't), you could write a function to convert hex to decimal, and then use codepoints-to-string() on the result.
use '&' for '&' in output:
<xsl:value-of select="concat('&#x',$char,';')"/>

Detecting Characters in an XSLT

I have encountered some odd characters that do not display properly in Internet Explorer, such as these: “, –, and ’. I think they're carried over from copy-and-paste Word content.
I am using XSLT to build the page content and it would be great to detect these characters in the XSLT and replace them with valid HTML codes. I already do string replacement in the style sheet, but I'm not sure how detect these encoded characters or whether it's possible.
What about simply changing the encoding for the Stylesheet as well as its output to UTF-8? The characters you mention are “, – and ’. Certainly not invalid or so, given the correct encoding (the characters are at least perfectly valid in Codepage 1252).
Using a good XML editor such as XMLSpy should highlight any errors in formatting your XSLT by validating at development time.
Jeni Tennison's Multiple string replacements may be a good starting point.