position()=1 working correctly, but not position()<5 - xslt

I'm new to XSLT, and I'm carrying out a few tests using w3schools "Try it yourself" pages. I'm using the following demo:
http://www.w3schools.com/xsl/tryxslt.asp?xmlfile=cdcatalog&xsltfile=tryxsl_choose
This contains the following line:
<xsl:for-each select="catalog/cd">
I'm testing filtering the HTML rendered by position() but I'm having issues when using the < operand.
I've tried the following:
<xsl:for-each select="catalog/cd[position()=1]">
And this returns the first item from the XML data (as expected).
I then tried:
<xsl:for-each select="catalog/cd[position()<5]">
I was expecting this to return the first 4 items, but instead I get no results.
My guess is that perhaps position()=1 is doing a string comparison, which is why it returns the first item, but it cannot understand position()<5 as a string cannot be compared in this way?
Why is this happening, and what would be the correct syntax to get the results I wish to achieve?
Update: After reading #joocer's response, and testing this myself, using the > operand does work, for the opposite result:
<xsl:for-each select="catalog/cd[(position()>5)]">

It looks very much like a bug in the version of libxslt that w3schools is using.

Even inside quotes, you must type < as < so it won't be confused for the start of an element tag. I think this was done to make it easier for tolerant parsers to recover from errors and streaming parsers skip content faster. They can always look for < outside CDATA and know that is an element start or end tag.

I don't know why, but inverting the condition works, so instead of looking for less than 5, look for not more than 4
<xsl:for-each select="catalog/cd[not(position()>4)]">

Related

Duplicate line and replace string

I have an XML file that contains more than 10,000 items. Each item contains a line like this.
<g:id><![CDATA[FBM00101816_BLACK-L]]></g:id>
For each item I need to add another line below like this:
<sku><![CDATA[FBM00101816]]></sku>
So I need to duplicate each g:id line, replace the g:id with sku and trim the value to delete all characters after the underscore (including it). The final result would be like this:
<g:id><![CDATA[FBM00101816_BLACK-L]]></g:id>
<sku><![CDATA[FBM00101816]]></sku>
Any ideas how to accomplish this?
Thanks in advance.
In XSLT, it's
<xsl:template match="g:id">
<xsl:copy-of select="."/>
<sku><xsl:value-of select="substring-before(., '_')"/></sku>
</xsl:template>
Or using Saxon's Gizmo (https://www.saxonica.com/documentation11/index.html#!gizmo) it's
follow //g:id with <sku>{substring-before(., '_')}</sku>
Don't try to do this sort of thing in a text editor (or any other tool that doesn't involve a real XML parser) unless it's a one-off. Your code will be too sensitive to trivial variations in the way the source XML is written and will almost inevitably have bugs - which might not matter for a one-off, but do matter if it's going to be used repeatedly over a period of time.
Note also, the CDATA tags in your input (and output) are a waste of space. CDATA tags have no significance unless the element content includes special characters like < and &, which isn't the case in your examples.
Okay, so after commenting, I couldn't help myself. This seemed to do what you asked for.
find: <g:id><!\[CDATA\[([^\_]+)?(.+)?\]></g:id>
replace: $0\n<sku><![CDATA[$1]></sku>
I don't have BBEdit, but this is what it looked like in Textmate:

UltraEdit/Notepad - XML Remove nodes with empty properties

I'm currently facing an issue with a software i'm working with , this software receives from an external sofware several Xmls that we do need to process , now our issue is that those Xml files contain a lot of nodes which are totally useless and also make the files (xmls) really heavy because of that , in result out program runs very slow to process each one of the xmls , this should be changed in the future and i'd like to prove that by removing those nodes we would improve our processing time a lot , now i'd like as first step to do this manually , using a sample xml and applying a regex syntax to remove all the nodes with value property empty , this is the syntax that i'm using now and through the replace function in notepad i'm able to remove those rows and then remove the empty lines :
<.*(\s\w+?[^=]*?="[^"]*?")*?\s+?value="[""]*?".*?>
Example
<TEST_NODE value="1"/>
<TEST_NODE value=""/>
<TEST_NODE value="0"/>
In my case nodes can be named differently and can have different properties , but the one that i should care for are the ones that contain something in the value property , therefore in this case i should remove the second row
This looks to be working fine , however with very large files (10 mb) the replace notepad++ function seems to have issues and it stop working properly breaking a lot of tags...
I've tried using another software called "Ultraedit" , but there the syntax i guess it's different as i can use regular Expressions but need to select one of those options : Perl , Unix , Ultraedit ; only using "Perl" i'm able to do this replacement but also there , for big files this is not working and i get the following error:
The complexity of matching the expression has exceeded available resources..
Can anyone help me out with this? unfortunately i'm not even that good with Regex and i'm not sure if the above code is good or bad..
Try this:
<(?=[^><]*?value\s*=\s*"")[^><]*>
Replace with nothing.
This might be a case of catastrophic backtracking when the regex runs caused by too many quantifiers applied to too many wide character classes like .
The quantifiers in this answer are only applied to not < or > class which should stop the expression backtracking through XML tags.
You're using the wrong tool for the job. If you're going to be manipulating XML then you need to add XSLT and/or XQuery to your tool kit. Using regular expressions for the job is slow and error-prone.
For example, here are just a few of the bugs in the answer that you accepted:
Elements that use single quotes (value='') won't be matched
Element with whitespace around the equals sign won't be matched
Elements with an attribute whose name ends in value (e.g. xvalue="") will be matched
value="" will be matched inside comment and CDATA nodes
value="" can be matched inside text nodes: <x>value=""</x>
Elements split across multiple lines won't be matched (I suspect)
In XSLT 3.0 this is simply
<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="*[#value='']"/>
</xsl:transform>
Try this regular expression in Notepad++
<[^<]+value=""[^>]*>

TinyXML2: Replace Node function?

I am having a hard time using TinyXML2 (https://github.com/leethomason/tinyxml2) to write a C/C++ method that replaces a given node like:
<doc>
<replace>Foo</replace>
</doc>
...with another node:
<replacement>Bar</replacement>
...so that the outcome is:
<doc>
<replacement>Bar</replacement>
</doc>
However, the node to be replaced may appear multiple times an I would like to keep the order in case I replace the second node with something else.
This should actually be straight-forward, but I am failing with endless recursions.
Is there probably an example around of how to do that? Any help would be greatly appreciated.
Do you have sample code?
You could try calling tinyxml2::XMLNode::InsertAfterChild to insert <replacement> followed by a deletion of <replace>.
This answer also seems related: Updating Data in tiny Xml element
I'd recommend copying the source xml to a new document using the visitor pattern making substitutions as you go. Substituting in-place is very likely to lead to broken chains and the endless loops that you're experiencing.
You can find an example of using the vistor pattern to make substutions (in element attributes and text but it's the same principle) here. See xcopy function and associated code near the bottom.

xslt test with 2 parameters

<xsl:if test="count($currentPage/..//$itemType) > 0">
I try to use the if statement with 2 param values and I get the error:
"unexpected token '$' in the expression..."
is it possible to do what I'm trying to ?
In XSLT, like in most programming languages (excluding macro languages), variables represent values, not fragments of expression text. I suspect $itemType holds an element name, and you are imagining that you can use it anywhere you could use an element name. If that's what you are trying to do, use ..//*[name()=$itemType].
This is invalid (and #Michael Kay explained it well):
//$varName
If I guess correctly what you are up to, then you may try this:
//*[name() = $varName]

need to display char in xslt

Hi all
I am using xslt 1.0. I have the char code as FOA7 which has to displayed as a corresponding character. My input is
<w:sym w:font="Wingdings" w:char="F0A7"/>
my xslt template is
<xsl:template match="w:sym">
<xsl:variable name="char" select="#w:char"/>
<span font-family="{#w:fonts}">
<xsl:value-of select="concat('&#x',$char,';')"/>
</span>
</xsl:template>
It showing the error as ERROR: 'A decimal representation must immediately follow the "&#" in a character reference.'
Please help me in fixing this..Thanks in advance...
This isn't possible in (reasonable) XSLT. You can work around it.
Your solution with concat is invalid: XSLT is not just a fancy string-concatenator, it really transforms the conceptual tree. An encoded character such as  is a single character - if you were to somehow include the letters & # x f 0 a 7 ; then the XSLT processor would be required to include these letters in the XML data - not the string! So that means it will escape them.
There's no feature in XSLT 1.0 that permits converting from a number to a character with that codepoint.
In XSLT 2.0, as Michael Kay points out, you can use codepoints-to-string() to achieve this.
There are two solutions. Firstly, you could use disable-output-escaping. This is rather nasty and not portable. Avoid this at all costs if you can - but it will probably work in your transformer, and it's probably the only general, simple solution, so you may not be able to avoid this.
The second solution would be to hardcode matches for each individual character. That's a mess generally, but quite possible if you're dealing with a limited set of possibilities - that depends on your specific problem.
Finally, I'd recommend not solving this problem in XSLT - this is typically something you can do in pre/post processing in another programming environment more appropriately. Most likely, you've an in-memory representation of the XML document to be able to use XSLT in the first place, in which case this won't even take much CPU time.
<span font-family="{#w:font}">
<xsl:value-of select="concat('&#x', #w:char, ';')"
disable-output-escaping="yes"/>
</span>
Though check #Eamon Nerbonne's answer, why you shouldn't do it at all.
If you were using XSLT 2.0 (which you aren't), you could write a function to convert hex to decimal, and then use codepoints-to-string() on the result.
use '&' for '&' in output:
<xsl:value-of select="concat('&#x',$char,';')"/>