XSLT Navigation of Base64-Encoded XML - xslt

I am using XSLT to transform a SOAP response. The XML response has one node encoded in Base64. If we decode this node, it becomes XML text, and I need to perform additional operations on the XML decoded from the Base64.
I would like to do the entire transformation and decoding within the XSLT transform.
The XML response looks something like this:
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Header/>
<env:Body>
<ns2:runReportResponse
xmlns:ns2="http://xmlns.oracle.com/oxp/service/PublicReportService">
<ns2:runReportReturn>
<ns2:reportBytes>PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPCEtLUdl
bmVyYXRlZCBieSBP
cmFjbGUgQkkgUHVibGlzaGVyIC1EYXRhZW5naW5lLCBkYXRhbW9kZWw6X0N1c3RvbV9TdXBwbHlf
Q2hhaW5fTWFuYWdlbWVudF9JbnZlbnRvcnlfTWFuYWdlbWVudF9JdGVtX1ZlbnR1cmVmb3J0aF94
I can succssfully call
select="sunBase64:base64Decode($base64Value)"
to decode the Base64 node in a
<xsl:variable or <xsl:value-of
tag, but I ultimately want to perform additional operations on the XML decoded from Base64, such as the following:
<xsl:for-each select
on the variable that was defined by calling base64Decode of com.sun.jersey.core.util.Base64.
The Base64 node can be decoded successfully, and the output looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<!--Generated by Oracle BI Publisher -Dataengine,
datamodel:_Custom_Supply_Chain_Management_Inventory_Management_Item_xdm -->
<DATA_DS><P_ORGANIZATION_ID>300000002559348</P_ORGANIZATION_ID>
<G_1>
<ORGANIZATION_NAME>Inventory Organization</ORGANIZATION_NAME>
<ORGANIZATIONID>300000002559348</ORGANIZATIONID>
<ITEMDESCRIPTION>LUBRICANT|MACHINE LUBE|ML-514|AUTOMOTIVE GEAR OIL||EXTREME
PRESSURE|55 GAL/400 LB DRUM|85W-140 VISC|||||||||</ITEMDESCRIPTION>
<ITEMNUMBER>527293318</ITEMNUMBER><PRIMARYUOMVALUE>DR</PRIMARYUOMVALUE>
<ITEM_ID>300000009810631</ITEM_ID>
</G_1>
As I mentioned earlier, I want to use
<xsl:for-each select="$xmlReportBytes/DATA_DS" >
But when I try to run this operation, I get an error in the Java program
that is running the XSLT transformation. Without the /DATA_DS,this
operation runs fine, but I need to navigate into the XML that was
originally encoded in Base64.
How can I resolve this error, or is there an alternative solution? I would like to do all of the operations within XSLT.

If you use an XSLT 3.0 processor with support for the EXPath Binary module then you can decode the base64 text using the bin:decode-string() function (see http://expath.org/spec/binary#decode-string) and then you can parse the resulting XML using the fn:parse-xml() function.
Without these functions, you'll need to call vendor-specific extension functions, and it then depends entirely on what XSLT processor you are using.
You can get these functions if you use Saxon-PE or Saxon-EE.
Note that decoding base64 produces a stream of octets, and to decode these octets as text you need to know the encoding of the text, e.g. iso-8859-1 or utf-8.

Related

Eliminate javascript from HTML with XSLT

I am trying to transform an HTML report into XML, but some javascript in the file is throwing errors, due to statements with a less-than character (e.g., for(var i=0; i<els.length;i++) ). I thought I could eliminate the javascript with the following template, which should remove entire script nodes:
<xsl:template match="script"/>
I assumed the XSLT processor would simply skip over the entire script nodes, but it's still throwing the same errors. I also tried adding this one:
<xsl:template match="script/text()"/>
No luck. If I manually remove all the javascript from the file, my transform works, but that's not practical as I need to create and run a daily automated process on these HTML files to extract some data in the HTML tables.
As a general rule, XSLT will only process well-formed XML input: it's not designed to process other formats like HTML.
However, XSLT will generally accept input from a parser that delivers a stream of events that looks sufficiently like an XML stream. This allows parsers like TagSoup and validator.nu to be used as a front-end to your XSLT processor.
Saxon packages this up with a parse-html() function that invokes TagSoup to parse HTML input and turn it into a DOM-like tree (actually an XDM tree) that it can process as if it came from XML.
validator.nu is a more up-to-date HTML parser than TagSoup, but you would have to do a little more work to integrate that.
Question was answered by Martin Honnen in the comments:
oxygenxml.com/doc/versions/18.1/ug-editor/tasks/… suggests there is an HTML import feature so try whether that helps. Of course there are standalone applications like HTML Tidy I think you can use outside of the XSLT processsing to first convert your HTML to XHTML.

how to parse ENCODED html within an XML document using XSLT

I'm trying to parse Feedburner's full text RSS feed (for example http://feeds.feedburner.com/IeeeSpectrumFullText) and the HTML content is in an element called "content:encoded", but it is encoded (the < symbol becomes < etc.). I'm trying to figure out if it's possible to decode that content via an XSLT transformation. I know that within PHP I can decode and parse it, but I'm hoping there's a way to do this purely in XSLT so that I can only have one PHP process (not conditionally decoding the HTML as necessary).
Please let me know if you have any suggestions.

XSLT 1.0: XML to XML conversion - how to transform Unicode to HTML entity?

I have an input XML that comes from a Unicode system. I have names for products
that look like this:
<MAKTX>Bear & Friends</MAKTX>
My XSLT transforms XML to XML. In my output the above line looks like this:
<MAKTX>Bear<(>&<)>Friends</MAKTX>
But I expect
<MAKTX>Bear&Friends</MAKTX>
I can't change the input XML from my source system.
How do I transform the Unicode & to the HTML entity &?
Firstly there is no need to change anything & and & are equivalent markup for a & in both XML and HTML.
If you use
<xsl:output method="html"/>
Then some processors are more likely to use the named form but it is not under your direct control from XSLT just as you can not directly control whether " or ' is used around attribute values, it is just a syntactic variation that no HTML system should care about.

use javascript (or JQuery) in a standalone HTML file to select an XML and transform

I need a way to transform XML to HTML (using XSL) but without a server. So, I want to create a standalone HTML file (with hardcodes XSL path and name).
Allow the user to select an XML
Transform it with the XSL and display results in browser
Original XML cannot be changed (so cannot just embed XSL in XML)
Is this possible? Everything I found requires post, but I'm not using a server
Regards
Mark
Yes, it's possible. And you don't need javascript to do it, but you can use javascript if you want.
Just look at the previous (XSLT question)[https://stackoverflow.com/questions/12964917]
Use a processing-instruction like...
<?xml-stylesheet type="text/xsl" href="soccer.xslt"?>
Refer:
Direct linkage through pi: http://www.w3.org/TR/xml-stylesheet/
Transform through javascript:
http://dev.ektron.com/kb_article.aspx?id=482
Calling XSLT from javascript

Transforming one XML document into another with C++

What would be a straightforward way to transform a source XML document into a destination XML document. There are only small differences between source and destination: Specifically I want to delete the first UnitIDRecord-Node within each UnitIDGroup-Node.
What would be the appropriate model for this task DOM or SAX?
What XML-library would best fit this problem (which guarantees that the source and destination only differs in the deleted nodes, no missing namespace, attributes, encoding, ...)?
I read about XSLT, could this be an option?
The XML document looks like following:
<?xml version="1.0" encoding="UTF-8"?>
<ExPostInformationRealGeneration xmlns="http://schemas.seven2one.de/EEX/TransparencyPlatform" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schemas.seven2one.de/EEX/TransparencyPlatform EEXTransparencyPlatform.xsd">
<DispatcherID>XYZ</DispatcherID>
<CreationDateTime>2012-05-22T13:57:00Z</CreationDateTime>
<MessageText>1 - Positiv - Meldung mit Quality-Tag - L000</MessageText>
<UnitIDGroup>
<UnitID>E110200-001</UnitID>
<UnitIDRecord><Quantity>16.9</Quantity><Starttime>2008-04-30T22:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>16.6</Quantity><Starttime>2008-04-30T23:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>16.4</Quantity><Starttime>2008-05-01T00:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
</UnitIDGroup>
<UnitIDGroup>
<UnitID>E110200-002</UnitID>
<UnitIDRecord><Quantity>16.9</Quantity><Starttime>2008-04-30T22:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>16.6</Quantity><Starttime>2008-04-30T23:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>16.4</Quantity><Starttime>2008-05-01T00:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
</UnitIDGroup>
<UnitIDGroup>
<UnitID>E110201-001</UnitID>
<UnitIDRecord><Quantity>7.0</Quantity><Starttime>2008-04-30T22:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>7.1</Quantity><Starttime>2008-04-30T23:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>7.1</Quantity><Starttime>2008-05-01T00:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
</UnitIDGroup>
<!-- other UnitIDGroup elements -->
</ExPostInformationRealGeneration>
I would consider the possibility of reading the file in as strings and writing the string out to another file if it matches your criteria. That's a 5 line program and avoids any parsing etc. It will run quickly and is simple. But, it is specific to this problem and not reusable. I offer this therefore as a suggestion not the correct solution!