XSLT RSS FEED - Combine substring-before and substring-after - xslt

My apologies in advance if this question is really simple, but I can’t seem to find a way around this issue.
I need a way to combine the substring-before and substring-after function in xsl so I have a start and end point within a description element of an RSS feed.
In each description tag I want to extract everything from ‘Primary Title’ onwards, but stop as soon as it reaches the first <b> tag.
I tried the following xsl without much success
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="channel">
<xsl:for-each select="item">
<xsl:value-of select=substring-after(description, 'Primary Title:' />
<xsl:value-of select=substring-before(description, '&ltb&gt' />
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Below is the XML I am currently working with.
<rss version="2.0">
<channel>
<item>
<title>Article_110224_081057</title>
<description>
<![CDATA[<div><b>Description:</b>This is my description<b>Primary Title:</b>This is my primary title<b>Second Title:</b>This is my second title title </div>
]]>
</description>
</item>
<item>
<title>Article_110224_081057</title>
<description>
<![CDATA[<div><b>Description:</b>This is my description<b>Other Title:</b>This is my other title<b>Second Title:</b>This is my second title titleb<b>Primary Title:</b>This is my primary title<b> more text </div>
]]>
</description>
</item>
</channel>
</rss>

If the <b> is a tag, you won't be able to find it using substring matching, because tags get turned into nodes by the parser. You'll only be able to match it as a substring if it isn't a tag, for example because it was contained in a CDATA section (which appears to be the case in your example).

May be this can help:
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="channel">
<xsl:for-each select="item">
<xsl:value-of select="
substring-after(
substring-before(
substring-after(description, 'Primary Title:'),
'<b'
),
'b>'
)
"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Result against your sample is:
This is my primary titleThis is my primary title

Related

Filter and Group in XSLT

I am trying to filter data and then group using XSLT. Here is my XML
<?xml version="1.0" encoding="UTF-8"?>
<AccumulatedOutput>
<root>
<Header>
<Add>true</Add>
<Name>Subscriber</Name>
<Value>SAC</Value>
</Header>
</root>
<root>
<Header>
<Add>true</Add>
<Name>System</Name>
<Value>CBP</Value>
</Header>
</root>
<root>
<Header>
<Add>false</Add>
<Name>Subscriber</Name>
<Value>SAC</Value>
</Header>
</root>
</AccumulatedOutput>
What I want to do is that group based on Header/Name and but remove the group in which Header/Add is false.
So in above example I there will be two groups created (one for Name=Subscriber and other for Name=System) but since the first group(with Name=Subscriber) contains Add=false , I want to ignore that and my output should only have one node in it , like below
<?xml version = "1.0" encoding = "UTF-8"?>
<root>
<Header>
<Name>System</Name>
<Value>CBP</Value>
<Add>true</Add>
</Header>
</root>
I tried using group by method but I can't figure out a way to filter it.
It will be a great help if someone can give me some pointers
Thanks
In XSLT 2.0, you could do:
<xsl:template match="/AccumulatedOutput">
<root>
<xsl:for-each-group select="root/Header" group-by="Name">
<xsl:if test="not(current-group()/Add='false')">
<xsl:copy-of select="current-group()"/>
</xsl:if>
</xsl:for-each-group>
</root>
</xsl:template>
No explicit XSLT conditional instructions:
<xsl:template match="/*">
<root>
<xsl:for-each-group group-by="Name" select=
"root/Header[for $n in string(Name)
return every $h in /*/root/Header[Name eq $n]
satisfies not($h/Add eq 'false')]">
<xsl:sequence select="current-group()"/>
</xsl:for-each-group>
</root>
</xsl:template>

XSL cross reference

I am stuck with a XSLT 1.0 problem. I tried to find info on StackOverflow but I couldn't apply the examples.
Here is the structure of my XML:
<XML>
<PR>
<AS>
<ID_AS>AS-001</ID_AS>
<FIRST>
<ID_CATALOG>Id-001</ID_CATALOG>
<STATUS>NOK</STATUS>
</FIRST>
<SECOND>
<ID_CATALOG>Id-002</ID_CATALOG>
<STATUS>OK</STATUS>
</SECOND>
</AS>
<AS>
<ID_AS>AS-002</ID_AS>
<FIRST>
<ID_CATALOG>Id-003</ID_CATALOG>
<STATUS>OK</STATUS>
</FIRST>
<SECOND>
<ID_CATALOG>Id-004</ID_CATALOG>
<STATUS>OK</STATUS>
</SECOND>
</AS>
</PR>
<METADATA>
<ID_CATALOG>Id-001</ID_CATALOG>
<ANGLES>32.25</ANGLES>
</METADATA>
<METADATA>
<ID_CATALOG>Id-002</ID_CATALOG>
<ANGLES>18.75</ANGLES>
</METADATA>
<METADATA>
<ID_CATALOG>Id-003</ID_CATALOG>
<ANGLES>5.23</ANGLES>
</METADATA>
<METADATA>
<ID_CATALOG>Id-004</ID_CATALOG>
<ANGLES>12.41</ANGLES>
</METADATA>
</XML>
I want to display for each AS, the FIRST/ID_CATALOG, FIRST/STATUS and ANGLES corresponding to the ID_CATALOG, then SECOND/etc.
The output would be similar to:
AS-001
Id-001 NOK 32.25
Id-002 OK 18.75
AS-002
Id-003 OK 5.23
Id-004 OK 12.41
I tried the following XSL but I only get the ANGLES for the first item
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns="http://earth.google.com/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:hma="http://earth.esa.int/hma" xmlns:gml="http://www.opengis.net/gml" xmlns:xlink="http://www.w3.org/1999/xlink">
<xsl:output method="xml" indent="yes" encoding="ISO-8859-1"/>
<!--==================MAIN==================-->
<xsl:template match="/">
<html>
<body>
AS List:
<br/><br/>
<xsl:call-template name="ASandCo"/>
</body>
</html>
</xsl:template>
<!--==================TEMPLATES==================-->
<xsl:template name="ASandCo">
<AS>
<xsl:for-each select="XML/PR/AS">
<xsl:value-of select="ID_AS"/>
<br/>
<xsl:value-of select="FIRST/ID_CATALOG"/> - <xsl:value-of select="FIRST/STATUS"/> -
<xsl:if test="contains(/XML/METADATA/ID_CATALOG, FIRST/ID_CATALOG)">
<xsl:value-of select="/XML/METADATA/ANGLES"/>
</xsl:if>
<br/>
<xsl:value-of select="SECOND/ID_CATALOG"/> - <xsl:value-of select="SECOND/STATUS"/> -
<xsl:if test="contains(/XML/METADATA/ID_CATALOG, SECOND/ID_CATALOG)">
<xsl:value-of select="/XML/METADATA/ANGLES"/>
</xsl:if>
<br/><br/>
</xsl:for-each>
</AS>
</xsl:template>
</xsl:stylesheet>
This XSLT will be applied to very large XML files, so I am trying to find the most efficient way.
Thank you very much in advance!
It seems like you want to look up some metadata metadata based on the ID_CATALOG value.
An efficient way to do this is by using a key. You can define a key on the top level:
<xsl:key name="metadata-by-id_catalog" match="METADATA" use="ID_CATALOG"/>
And then you can look up the ANGLES value using the key for a given ID_CATALOG value like this:
<xsl:value-of select="key('metadata-by-id_catalog', FIRST/ID_CATALOG)/ANGLES"/>
and this:
<xsl:value-of select="key('metadata-by-id_catalog', SECOND/ID_CATALOG)/ANGLES"/>

Find and replace a string inside an image path

I have a rss xml file that looks like below with the tag looping many times. I would like to replace the letter 's' with 'm' in the url inside the tag, So http://farm4.staticflickr.com/3802/9593294742_a38fca47c7_s.jpg becomes http://farm4.staticflickr.com/3802/9593294742_a38fca47c7_m.jpg
<rss xmlns:media="http://search.yahoo.com/mrss/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:creativeCommons="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html" xmlns:flickr="urn:flickr:user" version="2.0">
<channel>
<title>CCGalleria Pool</title>
<link>http://www.flickr.com/groups/ccgalleria/pool/</link>
<item>
<title>Hampton Court Palace Gardens</title>
<link>http://www.flickr.com/photos/dksesh/9593294742/in/pool-1540822#N20</link>
<description>
<p>
dksesh
has added a photo to the pool:</p>
<p>
<a href="http://www.flickr.com/photos/dksesh/9593294742/" title="Hampton Court Palace Gardens">
<img src="http://farm4.staticflickr.com/3802/9593294742_a38fca47c7_m.jpg" width="240" height="107" alt="Hampton Court Palace Gardens"/>
</a>
</p>
</description>
<pubDate>Sun, 25 Aug 2013 10:33:09 -0700</pubDate>
<dc:date.Taken>2013-08-10T16:49:05-08:00</dc:date.Taken>
<author flickr:profile="http://www.flickr.com/people/dksesh/">nobody#flickr.com (dksesh)</author>
<guid isPermaLink="false">tag:flickr.com,2004:/grouppool/1540822#N20/photo/9593294742</guid>
<media:content url="http://farm4.staticflickr.com/3802/9593294742_a38fca47c7_b.jpg" type="image/jpeg" height="456" width="1024"/>
<media:title>Hampton Court Palace Gardens</media:title>
<media:thumbnail url="http://farm4.staticflickr.com/3802/9593294742_a38fca47c7_s.jpg" HEIGHT="75" WIDTH="75"/>
<media:credit ROLE="photographer">dksesh</media:credit>
<creativeCommons:license>http://creativecommons.org/licenses/by-nd/2.0/deed.en_GB</creativeCommons:license>
</item>
</channel>
</rss>
I have a code something like below, but didn't work. Can someone help? Thanks very much. Newly changed code below.
<?xml version='1.0'?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:media="http://search.yahoo.com/mrss/">
<xsl:output method="html" />
<xsl:template match="/">
<xsl:for-each select="rss/channel/item">
<xsl:variable name="newurl" select="replace(media:thumbnail/#url,'_s.jpg','_m.jpg')"/>
<img src="{$newurl}" style="margin:5px 5px" />
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Your stylesheet sample showns the version as "2.0" which is good, because there is a replace function you can use in XSLT 2.0 that will probably do the trick here
<xsl:variable name="newurl" select="replace(media:thumbnail/#url,'_s.jpg','_m.jpg')"/>
<img src="{$newurl}" style="margin:5px 5px" />
Note that the second argument of the replace function can actually be a regular expression, if you wanted more control over what 's' needed to be replaced.
As an aside, do note the correct use of Attribute Value Templates here, signified by the curly braces { }. You use them in outputting the src attribute of the img element, but not within the translate/replace function.
Here is the full XSLT in this case:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:media="http://search.yahoo.com/mrss/">
<xsl:output method="html" />
<xsl:template match="/">
<xsl:for-each select="rss/channel/item">
<xsl:variable name="newurl" select="replace(media:thumbnail/#url,'_s.jpg','_m.jpg')"/>
<img src="{$newurl}" style="margin:5px 5px" />
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

XPath relative path in expression

I am in 'group' node. From it, I want to find such 'item' node, that has 'id' attribute equals to current's 'group' node 'ref_item_id' attribute value. So in my case, by being in 'group' node B, I want 'item' node A as output. This works:
<xsl:value-of select="preceding-sibling::item[#id='1']/#description"/>
But this doesn't (gives nothing):
<xsl:value-of select="preceding-sibling::item[#id=#ref_item_id]/#description"/>
When I type:
<xsl:value-of select="#ref_item_id"/>
I have '1' as result. So this attribute is for sure accessible, but I can't find path to it from XPath expression above. I tried many '../' combinations, but couldn't get it work.
Code to test: http://www.xmlplayground.com/7l42fo
Full XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<item description="A" id="1"/>
<item description="C" id="2"/>
<group description="B" ref_item_id="1"/>
</root>
Full XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" indent="no"/>
<xsl:template match="root">
<xsl:for-each select="group">
<xsl:value-of select="preceding-sibling::item[#id=#ref_item_id]/#description"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
This has to do with context. As soon as you enter a predicate, the context becomes the node currently being filtered by the predicate, and no longer the node matched by the template.
You have two options - use a variable to cache the outer scope data and reference that variable in your predicate
<xsl:variable name='ref_item_id' select='#ref_item_id' />
<xsl:value-of select="preceding-sibling::item[#id=$ref_item_id]/#description"/>
or make use of the current() function
<xsl:value-of select="preceding-sibling::item[#id=current()/#ref_item_id]/#description"/>
Your expression searches for an item whose id attribute matches its own ref_item_id. You need to capture the current ref_item_id in an xsl:variable and refer to that xsl:variable in the expression.
One more possible solution using xsl:key
<xsl:key name="kItemId" match="item" use="#id" />
<xsl:template match="root">
<xsl:for-each select="group">
<xsl:value-of select="key('kItemId', #ref_item_id)[1]/#description"/>
</xsl:for-each>
</xsl:template>
Looking at the XML, if I assume that you have <item> and <group> as siblings and in any order.
Then a sample input XML would look like the following.
<?xml version="1.0" encoding="UTF-8"?>
<root>
<item description="A" id="1"/>
<item description="C" id="2"/>
<group description="B" ref_item_id="1"/>
<item description="D" id="1"/>
<group description="E" ref_item_id="2"/>
</root>
Now, if the goal is to extract the description of all the <item> nodes whose id is matching with corresponding <group> *nodes ref_item_id*. Then we can simply loop over only such <item> nodes and get their description.
<xsl:output method="text" indent="no"/>
<xsl:template match="root">
<xsl:for-each select="//item[(./#id=following-sibling::group/#ref_item_id) or (./#id=preceding-sibling::group/#ref_item_id)]">
<xsl:value-of select="./#description"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Since you say that nodes are having unique id and all nodes are placed before nodes.
I would recommend you to use the following XSL and loop over specific nodes instead of nodes.
<xsl:output method="text" indent="no"/>
<xsl:template match="root">
<xsl:for-each select="//item[./#id=following-sibling::group/#ref_item_id]">
<xsl:value-of select="./#description"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

How to restrict which nodes produce output in stylesheet

I am working on a transform. The goal is to transform nodes into key/value pairs. Found a great stylesheet recommendation on this forum but I could use some help to tweak it a bit. For any node that has no children, the node name should become the value of <name> and the value should become the value of <value>. The source document may have some hierarchical structure to it, but I want to ignore that and only return the bottom nodes, transformed of course.
Here is my source data:
<?xml version="1.0" encoding="UTF-8"?>
<objects>
<Technical_Spec__c>
<Id>a0e30000000vFmbAAE</Id>
<F247__c>4.0</F247__c>
<F248__c xsi:nil="true"/>
<F273__c>Bronx</F273__c>
...
</Technical_Spec__c>
</objects>
Here is the stylesheet:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="*[count(*) = 0]">
<item>
<name>
<xsl:value-of select="name(.)" />
</name>
<value>
<xsl:value-of select="." />
</value>
</item>
</xsl:template>
<xsl:template match="*[count(*) > 0]">
<items>
<xsl:apply-templates/>
</items>
</xsl:template>
</xsl:stylesheet>
DESIRED OUTPUT - The stylesheet should transform these nodes to key value pairs like this:
<items>
<item>
<name>F247__c</name>
<value>4.0</value>
</item>
<item>
<name>F248__c</name>
<value></value>
</item>
<item>
<name>F273__c</name>
<value>Bronx</value>
</item>
...
</items>
CURRENT OUTPUT - But it creates nested 'items' elements like this:
<items>
<items>
<item><name></name><value></value></item>
...
</items>
</items>
I understand (I think) that it is matching all the parent nodes including the top node 'objects' and nesting the 'matches count 0' template. So I tried altering the matches attribute to exclude 'objects' and start at 'Technical_Spec__c' like this (just the template lines):
<xsl:template match="objects/Technical_Spec__c/*">
<xsl:template match="*[count(*) = 0]">
<xsl:template match="objects/*[count(*) > 0]">
In my mind this says "First (master) template only matches nodes with parents 'objects/Tech_Spec'. Second (inner) template matches any node with no children. Third (outer) template matches nodes with parent 'objects' " - which should limit me to one .
OUTPUT AFTER ALTERING MATCH - Here is what I get:
<?xml version="1.0" encoding="UTF-8"?>
- <items xmlns=""><?xml version="1.0"?>
<item><name>Id</name><value>a0e30000000vFmbAAE</value></item>
<item><name>F247__c</name><value>4.0</value></item>
...
</items>
The extra <items> block is gone but there is an extra <?xml> block stuck in the middle so it's not recognized as valid xml anymore.
Any ideas? Why the extra <?xml>; How to restrict template to particular parts of the tree?
Through a great deal of trial and error, I stumbled on the following solution: I added a root anchor to the third template match criteria.
Instead of match="*[count(*) > 0]", I now have /*[count(*) > 0]. This appears to eliminate the outer <items> element. If anyone can tell me why, I'd appreciate it. Why would this be different than /objects/*[count(*) > 0] ?
I do think Dimitre is right about the processor (which is IBM Cast Iron) so I did open a ticket. I tested the same stylesheet from above on an online XSLT tester and did not get the extra <?xml ?> tag.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="*[count(*) = 0]">
<item>
<name>
<xsl:value-of select="name(.)" />
</name>
<value>
<xsl:value-of select="." />
</value>
</item>
</xsl:template>
<xsl:template match="/*[count(*) > 0]">
<items>
<xsl:apply-templates/>
</items>