how to parse ENCODED html within an XML document using XSLT - xslt

I'm trying to parse Feedburner's full text RSS feed (for example http://feeds.feedburner.com/IeeeSpectrumFullText) and the HTML content is in an element called "content:encoded", but it is encoded (the < symbol becomes < etc.). I'm trying to figure out if it's possible to decode that content via an XSLT transformation. I know that within PHP I can decode and parse it, but I'm hoping there's a way to do this purely in XSLT so that I can only have one PHP process (not conditionally decoding the HTML as necessary).
Please let me know if you have any suggestions.

Related

Eliminate javascript from HTML with XSLT

I am trying to transform an HTML report into XML, but some javascript in the file is throwing errors, due to statements with a less-than character (e.g., for(var i=0; i<els.length;i++) ). I thought I could eliminate the javascript with the following template, which should remove entire script nodes:
<xsl:template match="script"/>
I assumed the XSLT processor would simply skip over the entire script nodes, but it's still throwing the same errors. I also tried adding this one:
<xsl:template match="script/text()"/>
No luck. If I manually remove all the javascript from the file, my transform works, but that's not practical as I need to create and run a daily automated process on these HTML files to extract some data in the HTML tables.
As a general rule, XSLT will only process well-formed XML input: it's not designed to process other formats like HTML.
However, XSLT will generally accept input from a parser that delivers a stream of events that looks sufficiently like an XML stream. This allows parsers like TagSoup and validator.nu to be used as a front-end to your XSLT processor.
Saxon packages this up with a parse-html() function that invokes TagSoup to parse HTML input and turn it into a DOM-like tree (actually an XDM tree) that it can process as if it came from XML.
validator.nu is a more up-to-date HTML parser than TagSoup, but you would have to do a little more work to integrate that.
Question was answered by Martin Honnen in the comments:
oxygenxml.com/doc/versions/18.1/ug-editor/tasks/… suggests there is an HTML import feature so try whether that helps. Of course there are standalone applications like HTML Tidy I think you can use outside of the XSLT processsing to first convert your HTML to XHTML.

Can I extract the columns/fields and logic used in an XSLT?

I am specifically looking to parse the XSLT to retrieve the fields in the input XML and also get the logic between the input and output data been generated,
I am not sure, but have i been given a target to create an XSLT parser which is like a sub module in a browser?
It is more like reverse engineer some code to get the source and map it to the destination data.

tibco xslt not accepting html script

I have to convert one xml to html page. I read that xml and mapped to transform xml. I have added html formatting tags in xslt. But it is not reflecting in page. I am getting data of xml in one line side by side. The html code not working what i have given. So any one can let me know how to transform xml to html, is there any other solution ?
yes, I have added that "tibco xslt " in title of this question ....
when working on tibco bw, I have to convert xml into html webpage; so i have used html code along with xslt transformation in xslt activity and also referenced it in transform activity. but the resulting html is not as required, all the elements of xml are coming in html side by side;
but when i used this same html code out side of tibco, it is working fine as it is showing in a well formatted table.....
Then my question is will tibco xslt execute html code or not ?
I am not too sure about using xslt transform activity for parsing a xml into html file.
Try parsing xml and then write file activity . You can alter the content of the file as per your requirement with tags and data from the parsed xml file.

Ignore initial data when parsing XML with Xerces

I hope someone on here has some knowledge of using xerces-c. I have a string that contains a valid XML packet. It had however some leading data that has nothing to do with the XML. Is it possible to have the xerces-c SAXParser ignore any leading data and simply parse the first valid XML it finds? I am using an extremely simply setup without even the use of a DTD as below:
SAXParser* lp_parser(0);
MySaxHandler l_handler;
lp_parser->setDocumentHandler((DocumentHandler *)&l_handler);
lp_parser->setDoNamespaces(false);
lp_parser->setDoNamespaces(false);
lp_parser->setDoSchema(false);
lp_parser->setValidationSchemaFullChecking(false);
MemBufInputSource lp_membuf((const XMLByte*)l_data.c_str(),
l_data.size(),
"My XML request",
false);
lp_parser->parse(lp_membuf);
The l_data is a std::string containing my XML packet including the initial data and MySaxHandler is where I save the few tags I am interested in. I can of course skip until I find the start of the XML myself but that is not the answer I was hoping for.

read file from url path and display in chart format

i working on one project. i want to read file which path from url,this file containing xml data i have to show this data in chart format.
Basically, your steps may be these:
Validate the URL data (StructKeyExists + FileExists + isFile).
Read and parse XML file, you can do this with XmlParse.
Convert XML object into the query (see query functions).
Render the data using great charting tags.
If you want more detailed help -- please expand your question, to make it more specific.