I am working on transforming a huge set of data (~1000k records) using XSLT 3.0. I am getting Java memory heap errors in my ERP system (Workday) since the input XML message is very big. I tried streaming xslt only, but couldn't make it work. Can someone please assist me in transforming the data memory efficient way.
<?xml version="1.0" encoding="UTF-8"?>
<a:Report_Data xmlns:a="urn:com.workday.report/INT_Currency_Conversion_Rates_-_Monthly_Source">
<a:Report_Entry>
<a:Source_Currency>USD</a:Source_Currency>
<a:Target_Currency>INR</a:Target_Currency>
<a:Currency_Rate>76.33</a:Currency_Rate>
</a:Report_Entry>
<a:Report_Entry>
<a:Source_Currency>USD</a:Source_Currency>
<a:Target_Currency>CHN</a:Target_Currency>
<a:Currency_Rate>16.33</a:Currency_Rate>
</a:Report_Entry>
<a:Report_Entry>
<a:Source_Currency>CHN</a:Source_Currency>
<a:Target_Currency>INR</a:Target_Currency>
<a:Currency_Rate>26.33</a:Currency_Rate>
</a:Report_Entry>
</a:Report_Data>
XSLT code that I have tried:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:a="urn:com.workday.report/INT_Currency_Conversion_Rates_-_Monthly_Source"
xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"
version="3.0">
<xsl:output method="xml" indent="no" omit-xml-declaration="yes" />
<xsl:mode streamable="yes" on-no-match="shallow-skip" />
<xsl:template match="a:Report_Data">
<RTMap>
<xsl:fork>
<xsl:for-each-group select="a:Report_Entry/copy-of()" group-by="a:Source_Currency">
<xsl:for-each-group select="current-group()" group-by="a:Target_Currency">
<Row>
<Map_Rate><xsl:value-of select="avg(current-group()/a:Currency_Rate)"/></Map_Rate>
<Map_From_Currency><xsl:value-of select="a:Source_Currency"/></Map_From_Currency>
<Map_Target_Currency><xsl:value-of select="a:Target_Currency"/></Map_Target_Currency>
</Row>
</xsl:for-each-group>
</xsl:for-each-group>
</xsl:fork>
</RTMap>
</xsl:template>
</xsl:stylesheet>
Thank you,
Jay
In XSLT 3 you can use a composite key and if you use copy-of() you don't need the xsl:fork:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xpath-default-namespace="urn:com.workday.report/INT_Currency_Conversion_Rates_-_Monthly_Source"
exclude-result-prefixes="#all"
expand-text="yes">
<xsl:template match="Report_Data">
<RTMap>
<xsl:for-each-group select="Report_Entry!copy-of()" composite="yes" group-by="Source_Currency, Target_Currency ">
<Row>
<Map_Rate>{avg(current-group()/Currency_Rate)}</Map_Rate>
<Map_From_Currency>{current-grouping-key()[1]}</Map_From_Currency>
<Map_Target_Currency>{current-grouping-key()[2]}</Map_Target_Currency>
</Row>
</xsl:for-each-group>
</RTMap>
</xsl:template>
<xsl:output method="xml" indent="yes"/>
<xsl:mode on-no-match="shallow-skip" streamable="yes"/>
</xsl:stylesheet>
But in the end any group-by needs to buffer groups as you don't know whether the last Report_Entry might belong into the first group so any grouping of that input based on those keys will consume memory. Streamed grouping with a low memory consumption works if you use group-starting-with or group-adjacent, if the input data and the requirements allow that, but group-by is always going to buffer groups.
Related
This is a follow up question to
how to get 'excel' new lines in spreadsheetML (MSXSLT)
but asked as a new question, to separate this into different issue, as the behaviour seems to be different between engines (I'll leave the specific context in the other question, this is purely how to achieve some functional result).
This XSLT (in saxon he) will create what I want.
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<root>
<bar>
<xsl:text disable-output-escaping="yes"> </xsl:text>
</bar>
</root>
</xsl:template>
</xsl:stylesheet>
and gives the output
<root>
<bar>
</bar>
</root>
this one wont:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
version="1.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="foo">
<bar>
<xsl:text disable-output-escaping="yes"> </xsl:text>
</bar>
</xsl:variable>
<root>
<xsl:copy-of select="exsl:node-set($foo)"/>
</root>
</xsl:template>
</xsl:stylesheet>
it gives
<bar> </bar>
(the question is about XSLT 1.0 but interestingly XSLT 3.0 can be made to work like this
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="foo">
<bar>
<xsl:text disable-output-escaping="yes"> </xsl:text>
</bar>
</xsl:variable>
<root>
<xsl:sequence select="$foo"/>
</root>
</xsl:template>
</xsl:stylesheet>
whilst
<xsl:copy-of select="$foo"/>
doesnt. Even following the 'sequence' pattern, I don't seem to be able to preserve non escaping in anything but a non trivial xslt - I've got a complex transformation using call-templates/apply-templates etc, and I think understanding how nodes are interpreted and serialised is not trivial)
There's actually a long history to this question, which was known in the working group as the "sticky d-o-e problem" (d-o-e being disable-output-escaping). The question is, does d-o-e have any effect when writing to a temporary tree (an xsl:variable), or is it only effective when writing to serialized output?
The XSLT 1.0 specification is pretty clear on the matter:
It is an error for output escaping to be disabled for a text node that
is used for something other than a text node in the result tree. Thus,
it is an error to disable output escaping for an xsl:value-of or
xsl:text element that is used to generate the string-value of a
comment, processing instruction or attribute node; it is also an error
to convert a result tree fragment to a number or a string if the
result tree fragment contains a text node for which escaping was
disabled. In both cases, an XSLT processor may signal the error; if it
does not signal the error, it must recover by ignoring the
disable-output-escaping attribute.
XSLT 2.0 deprecated d-o-e, but retained the rule in a slightly different form:
This [property], however, can be set only within a final result tree
that is being passed to the serializer.
But in between those two versions, the working group dithered. The XSLT 1.1 working draft (which never became a recommendation, but was popularised by the first version of my XSLT book) says:
When a root node is copied using an xsl:copy-of element ... and
escaping was disabled for a text node descendant of that root node,
then escaping should also be disabled for the resulting copy of that
text node. For example
<xsl:variable name="x">
<xsl:text disable-output-escaping="yes"><</xsl:text>
</xsl:variable>
<xsl:copy-of select="$x"/>
This is the "sticky d-o-e" - the d-o-e property is attached to the text node in the temporary tree and springs into life when the text node is eventually serialized. So this behaviour was endorsed at some stage in the life of XSLT, and you may be using a processor that implements this version of the spec.
Generally, though, try to forget that d-o-e exists. Whatever the problem, it's not the best solution. It's an incredibly messy feature because it requires a breaking of the architectural boundary between the transformation processor and the serializer, and breaking this boundary leads to close coupling of the transformation and serialization, and prevents you reusing the same code in a different pipeline configuration.
I'm afraid that researching the history of the W3C spec on this is rather easier than researching exactly what was implemented in early versions of Saxon (which are now nearly a quarter of a century old).
So to take the information from Michael Kay's answer which explains how the specification for XSLT 1.0 handles this, then we CAN implement a solution, for this.
So we take a recap of the underlying issue.
Excel spreadsheetML requires data to be formatted with the specific chars "
" to interpret a line feed in a cell (but this solution applies generally).
<Cell>Alpha
Bravo
Charlie</Cell>
If we try to write an XSLT to generate this, lets say naively:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<Cell>
<xsl:text>Alpha</xsl:text>
<xsl:text> </xsl:text>
<xsl:text>Bravo</xsl:text>
<xsl:text> </xsl:text>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:template>
</xsl:stylesheet>
our
will get delimited and we get this
<Cell>Alpha Bravo Charlie</Cell>
this (thanks to the answer on how to get 'excel' new lines in spreadsheetML (MSXSLT)) can be fixed by using
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<Cell>
<xsl:text>Alpha</xsl:text>
<xsl:text disable-output-escaping="yes"> </xsl:text>
<xsl:text>Bravo</xsl:text>
<xsl:text disable-output-escaping="yes"> </xsl:text>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:template>
</xsl:stylesheet>
which produces this:
<Cell>Alpha
Bravo
Charlie</Cell>
unfortunately this 'breaks' if you process your output document via some intermediary internal document e.g. even this:
<xsl:stylesheet version="1.0"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="msxsl">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="output">
<Cell>
<xsl:text>Alpha</xsl:text>
<xsl:text disable-output-escaping="yes"> </xsl:text>
<xsl:text>Bravo</xsl:text>
<xsl:text disable-output-escaping="yes"> </xsl:text>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:variable>
<xsl:copy-of select="msxsl:node-set($output)"/>
</xsl:template>
</xsl:stylesheet>
reverts to:
<Cell>Alpha Bravo Charlie</Cell>
because (see Michael Hay's answer) the disable-output-escaping attribute gets ignored if its passed through some internal document (i.e. the variable).
So...how can you get around this?
If you generate a token for the LF, you can then construct your psuedo excel output almost in its entirety except you use a custom element to flag the LF char, and then you can process that DIRECTLY into the result tree and interpret the custom element as an unescaped "
"
so this:
<xsl:stylesheet version="1.0"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:kookerella="kookerella.com"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="msxsl kookerella">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="output">
<Cell>
<xsl:text>Alpha</xsl:text>
<kookerella:LF/>
<xsl:text>Bravo</xsl:text>
<kookerella:LF/>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:variable>
<!-- process data directly into the result tree only -->
<xsl:apply-templates select="msxsl:node-set($output)" mode="injectLF"/>
</xsl:template>
<!-- Inject LF -->
<xsl:template match="#* | node()" mode="injectLF">
<xsl:copy>
<xsl:apply-templates select="#* | node()" mode="injectLF"/>
</xsl:copy>
</xsl:template>
<xsl:template match="kookerella:LF" mode="injectLF">
<xsl:text disable-output-escaping="yes"> </xsl:text>
<xsl:apply-templates select="#* | node()" mode="injectLF"/>
</xsl:template>
</xsl:stylesheet>
now results in:
<Cell>Alpha
Bravo
Charlie</Cell>
P.S.
as an aside, this seems to work for me in both the various MSXSLT and Saxon HE, but I have had an instance of using the MSXSLT engine where even this doesnt work, presumably due to some configuration out output serialisation issue.
I have come across related dates and am trying to find some algorithm to calculate the dependency between transfer date and exercise date. If our transfer date before exercise date it should convert to new element with value Spot. And if transfer date equal or later then exercise date , convert to new element with value Forward
<main>
<Exercise>
<transferDate>2000-08-30</transferDate>
<exerciseDate>2001-08-28</exerciseDate>
</Exercise>
<Exercise>
<transferDate>2001-08-30</transferDate>
<exerciseDate>2001-08-28</exerciseDate>
</Exercise>
<Exercise>
<transferDate>2001-08-28</transferDate>
<exerciseDate>2001-08-28</exerciseDate>
</Exercise>
</main>
I'm expecting next .xml using xslt
<?xml version="1.0" encoding="UTF-8"?>
<Type>Spot</Type>
<Type>Forward</Type>
<Type>Forward</Type>`
You could use something like this :
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="3.0">
<xsl:output method="xml" indent="yes"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="Exercise">
<xsl:copy>
<xsl:apply-templates/>
<xsl:variable name="tDate" select="xs:date(transferDate)"/>
<xsl:variable name="eDate" select="xs:date(exerciseDate)"/>
<Type>
<xsl:value-of select="if($tDate lt $eDate)
then 'Spot'
else if($tDate ge $eDate)
then 'Forward'
else 'Error'"/>
</Type>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
I have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<EMPLOYEE_ASSIGNMENT>
<REFRESH_DATE>2022-03-10 10:55:35.000</REFRESH_DATE>
<PERSON_ID>11189</PERSON_ID>
<EMPLOYEE_ID>032656300</EMPLOYEE_ID>
<EFFECTIVE_START_DATE>2020-08-19 00:00:00.000</EFFECTIVE_START_DATE>
<EFFECTIVE_END_DATE>4712-12-31 00:00:00.000</EFFECTIVE_END_DATE>
<BUSINESS_PROCESS>Absence Return for XXXXXXXXX last day of absence on 08/18/2020, first day back at work on 08/19/2020</BUSINESS_PROCESS>
<ACT_ASSIGNMENT_STATUS_TYPE_ID>1</ACT_ASSIGNMENT_STATUS_TYPE_ID>
<ACT_ORGANIZATION_ID>601</ACT_ORGANIZATION_ID>
<ACT_JOB_QUINTIQ_POSITION>Trainee</ACT_JOB_QUINTIQ_POSITION>
<ACT_HOURS_PER_WEEK>37.5</ACT_HOURS_PER_WEEK>
<ACT_HOURS_FREQUENCY>W</ACT_HOURS_FREQUENCY>
<ACT_BARGAINING_UNIT_CODE>C</ACT_BARGAINING_UNIT_CODE>
<ACT_PRIMARY_PROVINCE>BC</ACT_PRIMARY_PROVINCE>
</EMPLOYEE_ASSIGNMENT>
<EMPLOYEE_ASSIGNMENT>
<REFRESH_DATE>2022-03-10 10:55:35.000</REFRESH_DATE>
<PERSON_ID>11189</PERSON_ID>
<EMPLOYEE_ID>032656300</EMPLOYEE_ID>
<EFFECTIVE_START_DATE>2020-08-19 00:00:00.000</EFFECTIVE_START_DATE>
<EFFECTIVE_END_DATE>4712-12-31 00:00:00.000</EFFECTIVE_END_DATE>
<BUSINESS_PROCESS>Data Change: XXXXXXXXXXXX</BUSINESS_PROCESS>
<ACT_ASSIGNMENT_STATUS_TYPE_ID>1</ACT_ASSIGNMENT_STATUS_TYPE_ID>
<ACT_ORGANIZATION_ID>856</ACT_ORGANIZATION_ID>
<ACT_JOB_QUINTIQ_POSITION>Employee</ACT_JOB_QUINTIQ_POSITION>
<ACT_HOURS_PER_WEEK>37.5</ACT_HOURS_PER_WEEK>
<ACT_HOURS_FREQUENCY>W</ACT_HOURS_FREQUENCY>
<ACT_BARGAINING_UNIT_CODE>C</ACT_BARGAINING_UNIT_CODE>
<ACT_PRIMARY_PROVINCE>MB</ACT_PRIMARY_PROVINCE>
</EMPLOYEE_ASSIGNMENT>
</root>
The two nodes have the same EFFECTIVE_START_DATE but different BUSINESS_PROCESS for a single EMPLOYEE_ID. I need to transform that XML in a way that: when Two (or more) BUSINESS_PROCESS are present for and EMPLOYEE_ID on the same EFFECTIVE_START_DATE it shows only the one that is of value Data Change: XXXXXXXXX.
I need to transform it to:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<EMPLOYEE_ASSIGNMENT>
<REFRESH_DATE>2022-03-10 10:55:35.000</REFRESH_DATE>
<PERSON_ID>11189</PERSON_ID>
<EMPLOYEE_ID>032656300</EMPLOYEE_ID>
<EFFECTIVE_START_DATE>2020-08-19 00:00:00.000</EFFECTIVE_START_DATE>
<EFFECTIVE_END_DATE>4712-12-31 00:00:00.000</EFFECTIVE_END_DATE>
<BUSINESS_PROCESS>Data Change: XXXXXXXXXXXX</BUSINESS_PROCESS>
<ACT_ASSIGNMENT_STATUS_TYPE_ID>1</ACT_ASSIGNMENT_STATUS_TYPE_ID>
<ACT_ORGANIZATION_ID>856</ACT_ORGANIZATION_ID>
<ACT_JOB_QUINTIQ_POSITION>Employee</ACT_JOB_QUINTIQ_POSITION>
<ACT_HOURS_PER_WEEK>37.5</ACT_HOURS_PER_WEEK>
<ACT_HOURS_FREQUENCY>W</ACT_HOURS_FREQUENCY>
<ACT_BARGAINING_UNIT_CODE>C</ACT_BARGAINING_UNIT_CODE>
<ACT_PRIMARY_PROVINCE>MB</ACT_PRIMARY_PROVINCE>
</EMPLOYEE_ASSIGNMENT>
</root>
Thanks a lot
This is difficult to follow. If (!) I understand your description correctly, you want to do:
XSLT 2.0
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/root">
<xsl:copy>
<xsl:for-each-group select="EMPLOYEE_ASSIGNMENT" group-by="concat(EMPLOYEE_ID,'|', EFFECTIVE_START_DATE)">
<xsl:copy-of select="current-group()[starts-with(BUSINESS_PROCESS, 'Data Change:')]"/>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
I'm trying to figure out how to use XSLT Streaming (to reduce memory usage) in a scenario that requires grouping (with an arbitrary number of groups) and summing the group. So far I haven't been able to find any examples. Here's an example XML
<?xml version='1.0' encoding='UTF-8'?>
<Data>
<Entry>
<Genre>Fantasy</Genre>
<Condition>New</Condition>
<Format>Hardback</Format>
<Title>Birds</Title>
<Count>3</Count>
</Entry>
<Entry>
<Genre>Fantasy</Genre>
<Condition>New</Condition>
<Format>Hardback</Format>
<Title>Cats</Title>
<Count>2</Count>
</Entry>
<Entry>
<Genre>Non-Fiction</Genre>
<Condition>New</Condition>
<Format>Paperback</Format>
<Title>Dogs</Title>
<Count>4</Count>
</Entry>
</Data>
In XSLT 2.0 I would use this to group by Genre, Condition and Format and Sum the counts.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" indent="yes" />
<xsl:template match="/">
<xsl:call-template name="body"/>
</xsl:template>
<xsl:template name="body">
<xsl:for-each-group select="Data/Entry" group-by="concat(Genre,Condition,Format)">
<xsl:value-of select="Genre"/>
<xsl:value-of select="Condition"/>
<xsl:value-of select="Format"/>
<xsl:value-of select="sum(current-group()/Count)"/>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
For output I would get two lines, a sum of 5 for Fantasy, New, Hardback and a sum of 4 for Non-Fiction, New, Paperback.
Obviously this won't work with Streaming because the sum accesses the whole group. I think I need to iterate through the document twice. The first time I could build a map of the groups (creating a new group if one doesn't exist yet). The second time The problem is I also need an accumulator for each group with a rule that matches the group, and it doesn't seem you can create dynamic accumulators.
Is there a way to create accumulators on the fly? Is there another/easier way to do this with Streaming?
To be able to use streamed grouping with XSLT 3.0 one option that I see is to first transform the element based data you have into attribute based data using a stylesheet like
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
exclude-result-prefixes="xs math"
version="3.0">
<xsl:mode streamable="yes" on-no-match="shallow-copy"/>
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Entry/*">
<xsl:attribute name="{name()}" namespace="{namespace-uri()}" select="."/>
</xsl:template>
</xsl:stylesheet>
then you can perfectly used streamed grouping (as far as a streamed group-by is possible at all, as far as I understand there will be some buffering necessary) as follows:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
exclude-result-prefixes="xs math"
version="3.0">
<xsl:mode streamable="yes"/>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:fork>
<xsl:for-each-group select="Data/Entry" composite="yes" group-by="#Genre, #Condition, #Format">
<xsl:value-of select="current-grouping-key(), sum(current-group()/#Count)"/>
<xsl:text>
</xsl:text>
</xsl:for-each-group>
</xsl:fork>
</xsl:template>
</xsl:stylesheet>
I don't know whether first creating an attribute centric document is an option but I think it is better to share suggestions with code in an answer instead of trying to put them into a comment. And the answer in XSLT Streaming Chained Transform shows how to use Saxon 9 with Java or Scala to chain two streaming transformations without the need to write a temporary output file for the first transformation step.
As for doing it with copy-of on the original input format, Saxon 9.7 EE assesses the following as streamable and executes it with the right result:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="xs math"
version="3.0">
<xsl:mode streamable="yes"/>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:for-each-group select="copy-of(Data/Entry)" composite="yes"
group-by="Genre, Condition, Format">
<xsl:value-of select="current-grouping-key(), sum(current-group()/Count)"/>
<xsl:text>
</xsl:text>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
I am not sure it consumes less memory however than normal, tree based grouping. Perhaps you can measure with your real input data.
As a third alternative, to use a map as you seemed to want to do, here is an xsl:iterate example that iterates through the Entry elements, collecting the accumulated Count value in a map:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
xmlns:map="http://www.w3.org/2005/xpath-functions/map" exclude-result-prefixes="xs math map"
version="3.0">
<xsl:mode streamable="yes"/>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:iterate select="Data/Entry">
<xsl:param name="groups" as="map(xs:string, xs:integer)" select="map{}"/>
<xsl:on-completion>
<xsl:value-of select="map:keys($groups)!(. || ' ' || $groups(.))" separator="
"/>
</xsl:on-completion>
<xsl:variable name="current-entry" select="copy-of()"/>
<xsl:variable name="key"
select="string-join($current-entry/(Genre, Condition, Format), '|')"/>
<xsl:next-iteration>
<xsl:with-param name="groups"
select="
if (map:contains($groups, $key)) then
map:put($groups, $key, map:get($groups, $key) + xs:integer($current-entry/Count))
else
map:put($groups, $key, xs:integer($current-entry/Count))"
/>
</xsl:next-iteration>
</xsl:iterate>
</xsl:template>
</xsl:stylesheet>
While answering this question, it occurred to me that I know how to use the XSLT 3.0 (XPath 3.0) serialize() function, but that I do not know how to avoid serialization of namespaces that are in scope. Here is a minimal example:
XML Input
<?xml version="1.0" encoding="UTF-8" ?>
<ci:cichlids xmlns:ci="http://www.cichlids.com">
<cichlid id="1">
<name>Zeus</name>
<color>gold</color>
<teeth>molariform</teeth>
<breeding-type>lekking</breeding-type>
</cichlid>
</ci:cichlids>
XSLT 3.0 Stylesheet
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization"
xmlns:ci="http://www.cichlids.com">
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/ci:cichlids/cichlid">
<xsl:variable name="serial-params">
<output:serialization-parameters>
<output:omit-xml-declaration value="yes"/>
</output:serialization-parameters>
</xsl:variable>
<xsl:value-of select="serialize(., $serial-params/*)"/>
</xsl:template>
</xsl:stylesheet>
Actual Output
<?xml version="1.0" encoding="UTF-8"?>
<ci:cichlids xmlns:ci="http://www.cichlids.com">
<cichlid xmlns:ci="http://www.cichlids.com" id="1">
<name>Zeus</name>
<color>gold</color>
<teeth>molariform</teeth>
<breeding-type>lekking</breeding-type>
</cichlid>
</ci:cichlids>
The serialization process included the namespace declaration that is in scope for the cichlid element, although it is not used on this element. I would like to remove this declaration and make the output look like
Expected Output
<?xml version="1.0" encoding="UTF-8"?>
<ci:cichlids xmlns:ci="http://www.cichlids.com">
<cichlid id="1">
<name>Zeus</name>
<color>gold</color>
<teeth>molariform</teeth>
<breeding-type>lekking</breeding-type>
</cichlid>
</ci:cichlids>
I know how to modify the cichlid element, removing the namespaces in scope, and serialize this modified element instead. But this seems a rather cumbersome solution. My question is:
What is a canonical way to serialize an XML element using the serialize() function without also serializing unused namespace declarations that are in scope?
Testing with Saxon-EE 9.6.0.7 from within Oxygen.
Serialization will always give you a faithful representation of the data model that you are serializing. If you want to modify the data model, that's called transformation. Run a transformation to remove the unwanted namespaces, then serialize the result.
Michael Kay already gave the correct answer and I have accepted it. This is just to flesh out his comments. By
Run a transformation to remove the unwanted namespaces, then serialize the result.
he means applying a transformation like the following before calling serialize():
XSLT Stylesheet
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization"
xmlns:ci="http://www.cichlids.com">
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:variable name="cichlid-without-namespace">
<xsl:copy-of copy-namespaces="no" select="/ci:cichlids/cichlid"/>
</xsl:variable>
<xsl:template match="/ci:cichlids/cichlid">
<xsl:variable name="serial-params">
<output:serialization-parameters>
<output:omit-xml-declaration value="yes"/>
</output:serialization-parameters>
</xsl:variable>
<xsl:value-of select="serialize($cichlid-without-namespace, $serial-params/*)"/>
</xsl:template>
</xsl:stylesheet>
XML Output
<?xml version="1.0" encoding="UTF-8"?>
<ci:cichlids xmlns:ci="http://www.cichlids.com">
<cichlid id="1">
<name>Zeus</name>
<color>gold</color>
<teeth>molariform</teeth>
<breeding-type>lekking</breeding-type>
</cichlid>
</ci:cichlids>