Given the following XML document:
<books>
<book>
<name>The problem of the ages</name>
</book>
<book>
<name>Filtering the tap</name>
</book>
<book>
<name>Legend of Atlantis</name>
</book>
</books>
I want to take at most 2 words from the name of each book. Words can be assumed as being sequences of whitespace-separated characters. Example of output:
<library>
<record>The problem</record>
<record>Filtering the</record>
<record>Legend of</record>
</library>
How would I achieve this using a single XSLT?
Try (in 3.0 with expand-text enabled):
<xsl:template match="book/name">
<record>{tokenize(.) => subsequence(1, 2)}</record>
</xsl:template>
Related
Can anyone please exlain how the below xsl works with an example?
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Remove empty elements or attributes -->
<xsl:template match="#*|node()">
<xsl:if test=". != '' or ./#* != ''">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
When i use above xsl for below xml which is not indented (Note it is showing indented below but consider it not indented. The input text box did not allow me to put not indented xml):
<Book Edition="1234" Type="Novel" TimeStamp="2021-07-09T14:02:55-05:00" Version="1.003">
<BOS>
<LIB>
<RequestorID ID="XXX" Type="10"/>
</LIB>
</BOS>
<Sections>
<Section CreateDateTime="2021-07-03T11:21:43-05:00" CreatorID="XXX" Status="Read">
<UniqueID ID="443791" Type="10"/>
<Chapters>
<Chapter>
<Paragraphs>
<Paragraph NumberOfUnits="10" Lines="100">
<Rates>
<Rate EffectiveDate="2021-12-12" ExpireDate="2021-12-13" RateTimeUnit="Day" UnitMultiplier="1">
<Base AmountBeforeTax="145.90" CurrencyCode="USD"/>
</Rate>
</Rates>
</Paragraph>
</Paragraphs>
<Readers>
<Reader Age="10" Count="1"/>
</Readers>
<TimeSpan End="2021-12-13" Start="2021-12-12"/>
<BasicInfo BookCode="1310"/>
</Chapter>
</Chapters>
<Authors>
<Author AuthorRPH="1">
<Profiles>
<ProfileInfo>
<UniqueID ID="44379" Type="1"/>
<Profile ProfileType="1">
<Author>
<PersonName>
<GivenName>TEST</GivenName>
<Surname>TEST</Surname>
</PersonName>
<Telephone PhoneNumber="0"/>
<Email>test#test.com</Email>
<Address Type="H">
<AddressLine>123 MAIN ST</AddressLine>
</Address>
</Author>
</Profile>
</ProfileInfo>
</Profiles>
</Author>
</Authors>
<GlobalInfo>
<ReadIds>
<ReadId ReadID_Source="ZZZ" ReadID_Type="10" ReadID_Value="1234"/>
</ReadIds>
</GlobalInfo>
</Section>
</Sections>
</Book>
Then i get below output:
<Book Edition="1234" Type="Novel" TimeStamp="2021-07-09T14:02:55-05:00" Version="1.003">
<BOS>
<LIB>
<RequestorID ID="XXX" Type="10"/>
</LIB>
</BOS>
<Sections>
<Section CreateDateTime="2021-07-03T11:21:43-05:00" CreatorID="XXX" Status="Read">
<UniqueID ID="443791" Type="10"/>
<Authors>
<Author AuthorRPH="1">
<Profiles>
<ProfileInfo>
<UniqueID ID="44379" Type="1"/>
<Profile ProfileType="1">
<Author>
<PersonName>
<GivenName>TEST</GivenName>
<Surname>TEST</Surname>
</PersonName>
<Telephone PhoneNumber="0"/>
<Email>test#test.com</Email>
<Address Type="H">
<AddressLine>123 MAIN ST</AddressLine>
</Address>
</Author>
</Profile>
</ProfileInfo>
</Profiles>
</Author>
</Authors>
</Section>
</Sections>
</Book>
As seen above it removes everything inside the Chapters tag and Global info tag
But if i use above xsl for below xml which indented:
<Book Edition="1234" Type="Novel" TimeStamp="2021-07-09T14:02:55-05:00" Version="1.003">
<POS>
<Source>
<RequestorID ID="XXX" Type="10"/>
</Source>
</POS>
<Sections>
<Section CreateDateTime="2021-07-03T11:21:43-05:00" CreatorID="XXX" Status="Read">
<UniqueID ID="443791" Type="10"/>
<Chapters>
<Chapter>
<Paragraphs>
<Paragraph NumberOfUnits="10" Lines="100">
<Rates>
<Rate EffectiveDate="2021-12-12" ExpireDate="2021-12-13" RateTimeUnit="Day" UnitMultiplier="1">
<Base AmountBeforeTax="145.90" CurrencyCode="USD"/>
</Rate>
</Rates>
</Paragraph>
</Paragraphs>
<Readers>
<Reader Age="10" Count="1"/>
</Readers>
<TimeSpan End="2021-12-13" Start="2021-12-12"/>
<BasicInfo BookCode="1310"/>
</Chapter>
</Chapters>
<Authors>
<Author AuthorRPH="1">
<Profiles>
<ProfileInfo>
<UniqueID ID="44379" Type="1"/>
<Profile ProfileType="1">
<Author>
<PersonName>
<GivenName>TEST</GivenName>
<Surname>TEST</Surname>
</PersonName>
<Telephone PhoneNumber="0"/>
<Email>test#test.com</Email>
<Address Type="H">
<AddressLine>123 MAIN ST</AddressLine>
</Address>
</Author>
</Profile>
</ProfileInfo>
</Profiles>
</Author>
</Authors>
<GlobalInfo>
<ReadIds>
<ReadId ReadID_Source="ZZZ" ReadID_Type="10" ReadID_Value="1234"/>
</ReadIds>
</GlobalInfo>
</Section>
</Sections>
</Book>
Then i get correct output:
<Book Edition="1234" Type="Novel" TimeStamp="2021-07-09T14:02:55-05:00" Version="1.003">
<BOS>
<LIB>
<RequestorID ID="XXX" Type="10"/>
</LIB>
</BOS>
<Sections>
<Section CreateDateTime="2021-07-03T11:21:43-05:00" CreatorID="XXX" Status="Read">
<UniqueID ID="443791" Type="10"/>
<Chapters>
<Chapter>
<Paragraphs>
<Paragraph NumberOfUnits="10" Lines="100">
<Rates>
<Rate EffectiveDate="2021-12-12" ExpireDate="2021-12-13" RateTimeUnit="Day" UnitMultiplier="1">
<Base AmountBeforeTax="145.90" CurrencyCode="USD"/>
</Rate>
</Rates>
</Paragraph>
</Paragraphs>
<Readers>
<Reader Age="10" Count="1"/>
</Readers>
<TimeSpan End="2021-12-13" Start="2021-12-12"/>
<BasicInfo BookCode="1310"/>
</Chapter>
</Chapters>
<Authors>
<Author AuthorRPH="1">
<Profiles>
<ProfileInfo>
<UniqueID ID="44379" Type="1"/>
<Profile ProfileType="1">
<Author>
<PersonName>
<GivenName>TEST</GivenName>
<Surname>TEST</Surname>
</PersonName>
<Telephone PhoneNumber="0"/>
<Email>test#test.com</Email>
<Address Type="H">
<AddressLine>123 MAIN ST</AddressLine>
</Address>
</Author>
</Profile>
</ProfileInfo>
</Profiles>
</Author>
</Authors>
<GlobalInfo>
<ReadIds>
<ReadId ReadID_Source="ZZZ" ReadID_Type="10"
ReadID_Value="1234"/>
</ReadIds>
</GlobalInfo>
</Section>
</Sections>
</Book>
Can anyone explain if the xmls are same why does indentation give different output? Does proper indentation affect the xsl transformaion?
The code has a single template rule which matches all element, text, comment, processing instruction, and attribute nodes. If the node has a non-empty string value, or has an attribute with a non-empty string value, then it shallow-copies the node and processes its attributes and children recursively.
The overall effect is to copy the entire document except for elements that have no content and no non-empty attributes (such as <br/>) - plus a few other exceptions such as empty comments.
The XPath expression . != '' or ./#* != '' in the test might not behave as you expect. The specification for XPath 1.0 (usually used with XSLT 1.0), as indicated by version="1.0" says at https://www.w3.org/TR/1999/REC-xpath-19991116/#dt-string-value
For every type of node, there is a way of determining a string-value
for a node of that type. For some types of node, the string-value is
part of the node; for other types of node, the string-value is
computed from the string-value of descendant nodes.
and down in section "5.2 Element Nodes"
The string-value of an element node is the concatenation of the
string-values of all text node descendants of the element node
in document order.
So for an ordinary non-indented source element, string conversion involves recursively descending to elements and getting their values (ignoring attributes). In your case, for an element which also does not have attributes, the whole sub-tree gets eliminated (i.e. not copied to output) if it does not have any element with a value like <elem>value</elem> in it.
With indented source, you also have nodes representing the whitespace between the nodes. (XPath does not know that white-space is irrelevant for you, and assumes mixed content.) This causes to-string conversion of an element with (indented) sub-elements to have (at least) that whitespace in the result of the string conversion, rendering the . != '' XPath expression false.
I hope, this explains the difference in result depending on indentation in source.
You might want to have a look at https://www.w3.org/TR/1999/REC-xpath-19991116/#function-normalize-space for trimming the conversion result. Note that this would potentially also affect honoring of "real values" if these can be all whitespace.
Edit: Depending on what you want to achieve, you might consider having white-space eliminated by the XSLT processor by using xsl:strip-space.
I have a complex XML file structured by book title. Something like this, but with hundreds of books and sometimes many authors per book.
<Book>
<Title>Ken Lum</Title>
<Author>
<GivenName>Grant</GivenName>
<Surname>Arnold</Surname>
</Author>
</Book>
<Book>
<Title>Shore, Forest and Beyond</Title>
<Author>
<GivenName>Ian M.</GivenName>
<Surname>Thom</Surname>
</Author>
<Author>
<GivenName>Grant</GivenName>
<Surname>Arnold</Surname>
</Author>
</Book>
What I need to output is an alphabetized list of authors, and then a list of every book they worked on, also alphabetized, something like:
Arnold, Grant — Ken Lum; Shore, Forest and Beyond
Thom, Ian M. — Shore, Forest and Beyond
I have a version of the code working fairly well, but it is very slow, so I'm trying to optimize my approach. I recently learned of the Muenchian method of grouping from another user here and I'm trying to apply that.
The part I'm specifically stuck on right now is getting the list of titles per author. This is what I have right now:
<xsl:key name="books-by-author" match="Book"
use="concat(Author/GivenName, Contributor/Surname)" />
…
<xsl:template match="Author">
…
<xsl:apply-templates mode="ByAuthor" select=
"key('books-by-author',
concat(GivenName, Surname)
)">
<xsl:sort select="Title/TitleText"/>
</xsl:apply-templates>
</template>
But it seems that this is only matching Books where the Author is the first one listed, like:
Arnold, Grant — Ken Lum
Thom, Ian M. — Shore, Forest and Beyond
I figure the xsl:key is only using the first Author element, rather than checking every author. Is it possible to check every Author like that? Or is there a better approach?
I would suggest you look at this way:
XML
<Books>
<Book>
<Title>Ken Lum</Title>
<Author>
<GivenName>Grant</GivenName>
<Surname>Arnold</Surname>
</Author>
</Book>
<Book>
<Title>Shore, Forest and Beyond</Title>
<Author>
<GivenName>Ian M.</GivenName>
<Surname>Thom</Surname>
</Author>
<Author>
<GivenName>Grant</GivenName>
<Surname>Arnold</Surname>
</Author>
</Book>
</Books>
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:key name="author" match="Author" use="concat(Surname, ', ', GivenName)" />
<xsl:template match="/Books">
<Authors>
<!-- for each unique author -->
<xsl:for-each select="Book/Author[count(. | key('author', concat(Surname, ', ', GivenName))[1]) = 1]">
<xsl:sort select="Surname"/>
<xsl:sort select="GivenName"/>
<Author>
<!-- author's details-->
<xsl:copy-of select="Surname | GivenName"/>
<!-- list author's books -->
<Books>
<xsl:for-each select="key('author', concat(Surname, ', ', GivenName))/parent::Book">
<xsl:sort select="Title"/>
<xsl:copy-of select="Title"/>
</xsl:for-each>
</Books>
</Author>
</xsl:for-each>
</Authors>
</xsl:template>
</xsl:stylesheet>
Result
<?xml version="1.0" encoding="UTF-8"?>
<Authors>
<Author>
<GivenName>Grant</GivenName>
<Surname>Arnold</Surname>
<Books>
<Title>Ken Lum</Title>
<Title>Shore, Forest and Beyond</Title>
</Books>
</Author>
<Author>
<GivenName>Ian M.</GivenName>
<Surname>Thom</Surname>
<Books>
<Title>Shore, Forest and Beyond</Title>
</Books>
</Author>
</Authors>
I have XML document something like below.
<root>
<Book>
<Book_title/>
<author_name/>
</Book>
<Book>
<Book_title/>
<author_name/>
</Book>
<author_details>
<author_name/>
<author_DOB/>
<author_details/>
</root>
So can we compare Book/author_name with author_details/author_name dynamically with XSLT ...??
Define a key
<xsl:key name="author" match="author_details" use="autor_name"/>
then write a template for
<xsl:template match="Book/author_name">
<xsl:copy-of select=". | key('author', .)/author_DOB"/>
</xsl:template>
handle root and Book by the identity transformation (e.g. <xsl:mode on-no-match="shallow-copy"/> in XSLT 3) and add an empty
<xsl:template match="author_details"/>
to prevent copying/outputting those elements.
I need to concat a number 'N' of the XML in another XML.
For example:
I've this XML
<BOOKS>
<BOOK>
<TITTLE>Lord of the Rings</TITLE>
<AUTHORS>J. R. R. Tolkien</AUTHORS>
<AUTHORS>J. K. Rowling</AUTHORS>
<YEAR>2015</YEAR>
</BOOK>
<BOOK>
<TITTLE>The Hobbit: The Battle of the Five Armies</TITLE>
<AUTHORS>J. R. R. Tolkien</AUTHORS>
<YEAR>2013</YEAR>
</BOOK>
</BOOKS>
and this:
<BOOKS>
<BOOK>
<TITTLE>A Clash of Kings</TITLE>
<AUTHORS>George R. R. Martin</AUTHORS>
<AUTHORS>J. K. Rowling</AUTHORS>
<YEAR>2016</YEAR>
</BOOK>
<BOOK>
<TITTLE>The Hobbit: The Battle of the Five Armies</TITLE>
<AUTHORS>J. R. R. Tolkien</AUTHORS>
<YEAR>2013</YEAR>
</BOOK>
</BOOKS>
I need to generate a new file like this:
<BOOKS>
<BOOK>
<TITTLE>Lord of the Rings</TITLE>
<AUTHORS>J. R. R. Tolkien</AUTHORS>
<AUTHORS>J. K. Rowling</AUTHORS>
<YEAR>2015</YEAR>
</BOOK>
<BOOK>
<TITTLE>The Hobbit: The Battle of the Five Armies</TITLE>
<AUTHORS>J. R. R. Tolkien</AUTHORS>
<YEAR>2013</YEAR>
</BOOK>
<BOOK>
<TITTLE>A Clash of Kings</TITLE>
<AUTHORS>George R. R. Martin</AUTHORS>
<AUTHORS>J. K. Rowling</AUTHORS>
<YEAR>2016</YEAR>
</BOOK>
<BOOK>
<TITTLE>The Hobbit: The Battle of the Five Armies</TITLE>
<AUTHORS>J. R. R. Tolkien</AUTHORS>
<YEAR>2013</YEAR>
</BOOK>
</BOOKS>
The XMLs are in my directory: E:\books. I want to concat all files, for example: If I've two files, the script will concat them, but if I've three or more files, the script will concat also. How do I do it?
In XSLT 2.0 with Saxon it's
<xsl:template name="main">
<BOOKS>
<xsl:sequence select="collection('dir?select=*.xml')/BOOKS/BOOK"/>
</BOOKS>
</xsl:template>
Here is my input XML:
<Books>
<Book>
<BookId>1</BookId>
<Des>Dumm1</Des>
<Comments/>
<OrderDateTime>04/06/2009 12:37</OrderDateTime>
</Book>
<Book>
<BookId>2</BookId>
<Des>Dummy2</Des>
<Comments/>
<OrderDateTime>04/07/2009 12:37</OrderDateTime>
</Book>
<Book>
<BookId>3</BookId>
<Des>Dumm12</Des>
<Comments/>
<OrderDateTime>05/06/2009 12:37</OrderDateTime>
</Book>
<Book>
<BookId>4</BookId>
<Des>Dummy2</Des>
<Comments/>
<OrderDateTime>06/07/2009 12:37</OrderDateTime>
</Book>
</Books>
I pass an XML param and my Input XML is
<BookIDs>
<BookID>2</BookID>
<BookID>3</BookID>
</BookIDs>
My output should be like
<Books>
<Book>
<BookId>2</BookId>
<Des>Dummy2</Des>
<Comments/>
<OrderDateTime>04/07/2009 12:37</OrderDateTime>
</Book>
<Book>
<BookId>3</BookId>
<Des>Dumm12</Des>
<Comments/>
<OrderDateTime>05/06/2009 12:37</OrderDateTime>
</Book>
</Books>
How do I accomplish this using XSLT?
This works in Saxon 6.5.5...
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.1">
<xsl:param name="nodeset">
<BookIDs><BookID>2</BookID><BookID>3</BookID></BookIDs>
</xsl:param>
<xsl:template match="/Books">
<Books>
<xsl:variable name="Copy">
<wrap>
<xsl:copy-of select="Book"/>
</wrap>
</xsl:variable>
<xsl:for-each select="$nodeset/BookIDs/BookID">
<xsl:copy-of select="$Copy/wrap/Book[BookId=current()]"/>
</xsl:for-each>
</Books>
</xsl:template>
</xsl:stylesheet>
A pure XSLT solution will be pretty brittle though. Sub-query predicates didn't work, neither did a key. It is dependent upon the param being recognized as a node-set--which I was unable to achieve with a dynamic value (as opposed to the default in my example), even with exsl:node-set. This is also wasteful in that it copies all the Book elements from the source document.
There may be a better solution in XSLT 2.0. Alternately, if you are initiating your transform with some other language/tool, there may be better approaches available there. Another possibility could include the use of exsl:document to load your source document or params.