sgml to xml conversion - xslt

I have a following sample sgml data from my .sgm file and I want convert this in to xml
<?dtd name="viewed">
<?XMLDOC>
<viewed >xyz
<cite>
<yr>2010
<pno cite="2010 abc 1188">10
<?/XMLDOC>
<?XMLDOC>
<viewed>abc.
<cite>
<yr>2010
<pno cite="2010 xyz 5133">9
<?/XMLDOC>
Output should be like this:
<index1>
<num viewed="xyz"/>
<heading>xyz</heading>
<index-refs>
<link caseno="2010 abc 1188</link>
</index-refs>
</index-1>
<index1>
<num viewed="abc"/>
<heading>abc</heading>
<index-refs>
<link caseno="2010 xyz 5133</link>
</index-refs>
</index-1>
Can this be done in c# or can we use xslt 2.0 to do this kind of conversion?

Others have already given some good advice. Here's one way of putting it all together by first converting the input SGML to well-formed XML and then using XSLT to transform that to the exact format you need.
Converting your SGML to well-formed XML
The osx tool from the OpenSP package suggested by mzjn is a good tool for this. Since your SGML markup omits end tags, you need to have a DTD from which the correct nesting of elements can be determined. If you don't have a DTD, you need to create one. For your example input, it could be as simple as this:
<!ELEMENT toplevel o o (viewed)+>
<!ELEMENT viewed - o (#PCDATA,cite)>
<!ELEMENT cite - o (yr,pno)>
<!ELEMENT yr - o (#PCDATA)>
<!ELEMENT pno - o (#PCDATA)>
<!ATTLIST pno cite CDATA #REQUIRED>
You also need to add a proper doctype declaration to the beginning of your SGML file. Assuming you have your DTD in file viewed.dtd.
<!DOCTYPE toplevel SYSTEM "viewed.dtd" >
With this addition, you should now be able use osx to convert the SGML to XML. (It won't be able to convert the processing instructions which start with a / as those are not allowed in XML, and will emit a warning about them.)
osx input.sgm > input.xml
Transforming the resulting XML to your desired format
For the above case, you could use something like the following XSLT stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="VIEWED">
<index1>
<num viewed="{normalize-space(text())}"/>
<heading>
<xsl:value-of select="normalize-space(text())"/>
</heading>
<index-refs>
<xsl:apply-templates select="CITE"/>
</index-refs>
</index1>
</xsl:template>
<xsl:template match="CITE">
<link caseno="{PNO/#CITE}"/>
</xsl:template>
</xsl:stylesheet>

Maybe you can use the osx SGML to XML converter. It is part of the OpenSP package (based on SP, originally written by James Clark).
http://openjade.sourceforge.net/doc/index.htm
http://www.jclark.com/sp/index.htm

Can the SGML-Reader, originally developed by Chris Lovett help in solving this problem?

Why XSLT? I doubt you can map SGML to XML Infoset or XDM...
I think that you should better use the language made for this task: DSSSL (Document Style Semantics and Specification Language)
This is the predecessor of XSLT. The author is James Clark. And this is the his site.

Related

merge multiple XML files using STAX parser

I have multiple XML files.all nodes are similar. Please provide an example how to merge XML files using STAX Parser and apply a stylesheet on it.
If you want to apply XSLT to several XML documents then (with pure XSLT, I don't know about Stax) you can simply use the document function (XSLT 1.0 and 2.0) or the collection function (with XSLT 2.0) e.g.
<xsl:template match="/">
<root>
<xsl:apply-templates select="document('file1.xml')/* | document('file2.xml')/* | document('file3.xml')/*"/>
</root>
</xsl:template>
then add templates matching the element names in the documents you want process.

XSLT text to large integer

I'm attempting to create an XSLT mapping that properly converts a fairly large integer value coming through in a text field into the appropriate integer value. The problem is that since 1.0 only supports converting to type number, I get a value like 1.234567890E9 back for input of "1234567890"
I'm using Altova MapForce with XSLT1.0 as the coding platform. XSLT2.0 doesn't appear to be an option, as the XSLT has to be processed using a pre-existing routine that only supports XSLT1.0
By default Mapforce generates
<xsl:value-of select="string(floor(number(string(.))))"/>
and I've tried every combination of functions I can think of, but always get a float for large values.
Further testing shows the problem lies in Mapforce, which insists on using the number() function when mapping from text to int.
Let me try and move this forward by answering a question that you did not ask, but perhaps should have. Suppose you have the following input:
XML
<input>
<value>1234567890000000.9</value>
<value>9876543210000000</value>
</input>
and you want to make sure that the input values (which are all numbers, but some of them are not integers) are converted to integers at the output, you could apply the following transformation:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<output>
<xsl:for-each select="input/value">
<value><xsl:value-of select="format-number(., '#')"/></value>
</xsl:for-each>
</output>
</xsl:template>
</xsl:stylesheet>
to obtain the following output:
<?xml version="1.0" encoding="UTF-8"?>
<output>
<value>1234567890000001</value>
<value>9876543210000000</value>
</output>
Note that the results here are rounded, not floored.
Are you sure that mapforce isn't using xslt-2.0?
If I do in XSLT-1.0 (with either saxon or Altova's processor):
<xsl:value-of select="number('1234567890')"/>
I get -> 1234567890
If I use XSLT-2.0 I get -> 1.23456789E9
So I think it is very strange that an XSLT 1 transformation supposedly returns you the floating point representation of the number.
Formatting the number with format-number(1.23456789E9,'#') will always give you 1234567890 in both XSLT-1.0 and 2.0. Edit: saxon will not convert 1.23456789E9 to number in xslt-1.0, altova's processor however will.
The problem lies within Mapforce, so I've decided to let mapforce generate it's code, then overwrite it for this one field that's causing all the trouble.
#Tobias #Michael Thanks to you both for your help. I've +1'ed both your answers and a few comments since your help led to the answer.

XSLT: How to remove synonymous namespaces

I have a large collection of XML files which I need to transform using XSLT. The problem is that many of these files were hand-written by different people and they do not use consistent names to refer to the schemas. For example, one file might use:
xmlns:itemType="http://example.com/ItemType/XSD"
where another might use the prefix "it" instead of "itemType":
xmlns:it="http://example.com/ItemType/XSD"
If that's not bad enough, there are several files which use two or three synonyms for the same thing!
<?xml version="1.0"?>
<Document
xmlns:it="http://example.com/ItemType/XSD"
xmlns:itemType="http://example.com/ItemType/XSD"
xmlns:ItemType="http://example.com/ItemType/XSD"
...
(there's clearly been a lot of cutting and pasting going on)
Now, because the pattern matching in the XSLT file appears to work on the namespace prefix (as opposed to the schema it relates to) the pattern only matches one of the variants. So if I write something like:
<xsl:template match="SomeNode[#xsi:type='itemType:SomeType']">
...
</xsl:template>
Then it only matches a subset of the cases that I want it to.
Question 1: Is there any way to get the XSLT to match all the variants?
Question 2: Is there any way to remove the duplicates so all the output files use consistent naming?
I naïvely tried using "namespace-alias" but I guess I've misunderstood what that does because I can't get it to do anything at all - either match all the variants or affect the output XML.
<?xsl:stylesheet
version="1.0"
...
xmlns:it="http://example.com/ItemType/XSD"
xmlns:itemType="http://example.com/ItemType/XSD"
xmlns:ItemType="http://example.com/ItemType/XSD"
...
<xsl:output method="xml" indent="yes"/>
<xsl:namespace-alias stylesheet-prefix="it" result-prefix="ItemType"/>
<xsl:namespace-alias stylesheet-prefix="itemType" result-prefix="ItemType"/>
Attribute values or text nodes won't be cast to QName unless you explicitly say so. Although this is only posible in XSLT/XPath 2.0
In XSLT/XPath 1.0 you must do this "manually":
<xsl:template match="SomeNode">
<xsl:variable name="vPrefix" select="substring-before(#xsi:type,':')"/>
<xsl:variable name="vNCName"
select="translate(substring-after(#xsi:type,$vPrefix),':','')"/>
<xsl:if test="namespace::*[
name()=$vPrefix
] = 'http://example.com/ItemType/XSD'
and
$vNCName = 'SomeType'">
<!-- Content Template -->
<xsl:if>
</xsl:template>
Edit: All in one pattern (less readable, maybe):
<xsl:template match="SomeNode[
namespace::*[
name()=substring-before(../#xsi:type,':')
] = 'http://example.com/ItemType/XSD'
and
substring(
concat(':',#xsi:type),
string-length(#xsi:type) - 7
) = ':SomeType'
]">
<!-- Content Template -->
</xsl:template>
In XSLT 2.0 (whether or not you use schema-awareness) you can write the predicate as [#xsi:type=xs:QName('it:SomeType')] where "it" is the prefix declared in the stylesheet for this namespace. It doesn't have to be the same as the prefix used in the source document.
Of course matching of element and attribute names (as distinct from QName-valued content) uses namespace URIs rather than prefixes in both XSLT 1.0 and XSLT 2.0.

Select the value of the attribute

<Surcharge>
<Rentalplus desc="Rental plus">75.00</Rentalplus>
<Gasket desc="Seals and gasket">50.00</Gasket>
<WearandTear desc"Wear and Tear">100.00</WearandTear>
</Surcharge>
from the above xml i want to extract the "desc". keep in mind i have different tag names under the node.
Thanks for the help
How about a minimalist solution ?
//#desc
Or more precise
/Surcharge//#desc
Or even more precise
/Surcharge/*[self::Rentalplus|self::Gasket|self::WearandTear]/#desc
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/*">
<xsl:apply-templates select="*/#desc"/>
</xsl:template>
</xsl:stylesheet>
Exploits built-in rules. Result will be:
Rental plusSeals and gasketWear and Tear
Use:
/*/*/#desc
This selects all desc attributes of all children of the top element of the XML document.
Never use the // abbreviation when the structure of the document is well-known. Using the // abbreviation may result in significantly slow evaluation, because it causes traversal of the whole XML document.
Should be something like this:
//#desc
See syntax from the w3schools site http://www.w3schools.com/xsl/xpath_syntax.asp

Ignore name space with t: prefix

We have XML file like below...
<?xml version='1.0'?>
<T0020 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.safersys.org/namespaces/T0020V1 T0020V1.xsd"
xmlns="http://www.safersys.org/namespaces/T0020V1">
<IRP_ACCOUNT>
<IRP_CARRIER_ID_NUMBER>1213561</IRP_CARRIER_ID_NUMBER>
<IRP_BASE_COUNTRY>US</IRP_BASE_COUNTRY>
<IRP_BASE_STATE>AL</IRP_BASE_STATE>
<IRP_ACCOUNT_NUMBER>15485</IRP_ACCOUNT_NUMBER>
<IRP_ACCOUNT_TYPE>I</IRP_ACCOUNT_TYPE>
<IRP_STATUS_CODE>0</IRP_STATUS_CODE>
<IRP_STATUS_DATE>2004-02-23</IRP_STATUS_DATE>
<IRP_UPDATE_DATE>2007-03-09</IRP_UPDATE_DATE>
<IRP_NAME>
<NAME_TYPE>LG</NAME_TYPE>
<NAME>WILLIAMS TODD</NAME>
<IRP_ADDRESS>
<ADDRESS_TYPE>MA</ADDRESS_TYPE>
<STREET_LINE_1>P O BOX 1210</STREET_LINE_1>
<STREET_LINE_2/>
<CITY>MARION</CITY>
<STATE>AL</STATE>
<ZIP_CODE>36756</ZIP_CODE>
<COUNTY/>
<COLONIA/>
<COUNTRY>US</COUNTRY>
</IRP_ADDRESS>
</IRP_NAME>
</IRP_ACCOUNT>
</T0020>
In order to Insert this XML data to database ,we have used two XSLT.
First XSLT will remove name space from XML file and convert this XML to some intermediate
XML(say Process.xml) file on some temporary location.
then we were taking that intermediate xml(without namespace lines) and applied another XSL
to map xml field to Database.
Then we have found solution and we have used only one XSLT which does bode [1] Remove namespace and [2] Mapping XML field to Database to insert data.
Our final style sheet contain following lines
xmlns:t="http://www.safersys.org/namespaces/T0020V1">
and we used following to map field to Database
<xsl:template match="/">
<xsl:element name="T0020">
<xsl:apply-templates select="t:T0020/t:IRP_ACCOUNT" />
</xsl:element>
</xsl:template>
how did our problem solved with this approach ?Any consequences with using this ?
I have searched about this but not getting the functionality.
Thanks in Advance..
I don't see any problems with your approach.
XSLT mandates a fully qualified name for a correct matching, so using a prefixed namespace in your XSLT is the right solution; this is why you solved your problem.