Remove XML declaration using msxml2 IXMLDOMDocument2 - c++

I want to remove XML declaration only from an XML using C++
<?xml version="1.0" encoding="UTF-8" ?>
Then I want to add this line and resave the XML
<?xml version="1.0" encoding="ISO-8859-1" ?>
All I have and know how to do it load the xml document
hr = IXMLDOMDocument->load(vstrfilename, &status);
using the IXMLDOMDocument2 interface of msxml2
How can I achieve this ?
My programming environment is borland c++ builder 6
Thank You

<? some text ?> is a processing instruction. The node is of type NODE_PROCESSING_INSTRUCTION.
Retrieve the node as the first child of the document, using get_childNodes and delete it with removeChild.
Then, use createProcessingInstruction for the new encoding and use insertBefore (with the new first child) to add it to the document.

Related

Why does TinyXml2 put XMLDeclaration at the end?

I'm using TinyXml2 v8.0.0 to create an XML buffer to send to an API. The example includes a declaration. I'm implementing this with:
XMLDocument doc;
doc.InsertEndChild(doc.NewDeclaration());
XMLElement* pRoot = doc.NewElement("Stuff");
doc.InsertFirstChild(pRoot);
The documentation for NewDeclaration states:
If the text param is null, the standard declaration is used.:
<?xml version="1.0" encoding="UTF-8"?>
You can see this as a test in https://github.com/leethomason/tinyxml2/blob/master/xmltest.cpp#L1637
But when I print the buffer out the declaration has been placed at the end of the buffer after a newline:
<Stuff>
</Stuff>
<?xml version="1.0" encoding="UTF-8"?>
Does anyone know why this is happening? I'd expect it to be at the start of the buffer with no newline.
Presumably, that's because you told it to put the declaration as the EndChild and the Stuff element as the FirstChild.

How to process HTML entities in XSLT

I am trying to transform XHTML that contains the entity. Saxon complains that the entity is not defined. How can I define it?
Is it possible to add the entity definition at the beginning of the stylesheet? As suggested
here:
http://s-n-ushakov.blogspot.com/2011/09/xslt-entities-java-xalan.html
or here:
Using an HTML entity in XSLT (e.g. )
My puny attempt, ignored by Saxon, was to add the following to the beginning of the XSLT:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE stylesheet [
<!ENTITY nbsp " ">
]>
I am using Saxon 9.9 PE.
The HTML I am trying to transform is a complete document, not just a fragment.
One possibility is to pass the URL of the XHTML to the XSLT as a parameter, which would read the XHTML as text using the unparsed-text() function, expand the entity reference using the replace() function, and parse the result using the parse-xml() function. e.g.
<xsl:template name="xsl:initial-template">
<xsl:param name="source"/>
<xsl:apply-templates select="
$source
=> unparsed-text()
=> replace('&nbsp;', '&#x000A0;')
=> parse-xml()
"/>
</xsl:template>
If the input document contains an entity reference that isn't declared in the DOCTYPE declaration, then it isn't a well-formed XML document, and therefore it isn't a well-formed XHTML document; and if it isn't well-formed, then Saxon can't handle it.
It would be best to look at the processing workflow that generated this ill-formed document and fix it so the documents it produces are well-formed.
If you can't do that, then you might be able to parse it as HTML. Saxon has an extension function saxon:parse-html(); or if your application is in Java then you could create a SAXSource that uses validator.nu as its XMLReader.
You should consider using the tool Tidy and convert html files into xhtml. It corrects all such things.
Just run tidy with the argument -asxml.

Is there a way to make Xerces-C pretty-print XML without double-spaced lines?

I'm using xerces-c to write XML. By default it writes the entire DOM as a single line of text. I tried the pretty-print option, like below, and now it prints double-spaced lines - which isn't very pretty, in my opinion. Is there a way to avoid the double-spacing?
void configureWriter(DOMLSSerializer* writer) {
writer->getDomConfig()->setParameter(XMLUni::fgDOMWRTFormatPrettyPrint, true);
}
Output:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<MyDocument>
<A>B</A>
<D>E</D>
</MyDocument>
OK, I found an answer. There's a different option called "fgDOMWRTXercesPrettyPrint", and if you also turn this off then there are no empty lines in the output.
void configureWriter(DOMLSSerializer* writer) {
writer->getDomConfig()->setParameter(XMLUni::fgDOMWRTFormatPrettyPrint, true);
writer->getDomConfig()->setParameter(XMLUni::fgDOMWRTXercesPrettyPrint, false);
}
Output:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<MyDocument>
<A>B</A>
<D>E</D>
</MyDocument>
This was the mail thread which gave me the answer: http://mail-archives.apache.org/mod_mbox//xerces-c-users/200908.mbox/%3C4A7697C0.1000304#datadirect.com%3E

Please explain why I would get a no schema error when there is, in fact, an associated schema?

OK. So I am still learning the ins and outs of XSLT and associating schemas. In my company we use XSLT in a very specific way, to transform XML metadata from one schema to another. (i.e. Dublin Core to PBCore, our house standard metadata to METS, etc.) I have a plain XML file with our standard metadata tags. I transform it using an XSLT that has these declarations:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" indent="yes" encoding="UTF-8"/>
<xsl:template match="/">
<reVTMD xmlns="http://nwtssite.nwts.nara/schema/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://www.archives.gov/preservation/products/reVTMD.xsd"
recordCreation="2016-03-24T18:13:51.0Z" profile="profile1"
version="version1">
The output XML includes this:
<?xml version="1.0" encoding="UTF-8"?>
<reVTMD xmlns="http://nwtssite.nwts.nara/schema/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://www.archives.gov/preservation/products/reVTMD.xsd"
recordCreation="2016-03-24T18:13:51.0Z"
profile="profile1"
version="version1">
at the top of the document. But I still get a "There is no schema or DTD associated with the document." in Oxygen when I try to validate the document against the reVTMD schema. What am I doing wrong?
The xsi:schemaLocation attribute as its value needs a list of pairs associating a namespace with a schema location so with the input being in the namespace http://nwtssite.nwts.nara/schema/ and the schema being in the location https://www.archives.gov/preservation/products/reVTMD.xsd you need
xsi:schemaLocation="http://nwtssite.nwts.nara/schema/ https://www.archives.gov/preservation/products/reVTMD.xsd"

XSLT transformation passing parameters

I am trying to pass parameters during an XSLT transformation. Here is the xsl stylesheet.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="param1" select="'defaultval1'" />
<xsl:param name="param2" select="'defaultval2'" />
<xsl:template match="/">
<xslttest>
<tagg param1="{$param1}"><xsl:value-of select="$param2" /></tagg>
</xslttest>
</xsl:template>
</xsl:stylesheet>
The following in the java code.
File xsltFile = new File("template.xsl");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document stylesheet = builder.parse("template.xsl");
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer xsltTransformer = transformerFactory.newTransformer(new DOMSource(stylesheet));
//Transformer xsltTransformer = transformerFactory.newTransformer(new StreamSource(xsltFile));
xsltTransformer.setParameter("param1", "value1");
xsltTransformer.setParameter("param2", "value2");
StreamResult result = new StreamResult(System.out);
xsltTransformer.transform(new DOMSource(builder.newDocument()), result);
I get following errors:-
ERROR: 'Variable or parameter 'param1' is undefined.'
FATAL ERROR: 'Could not compile stylesheet'
However, if i use the following line to create the transformer everything works fine.
Transformer xsltTransformer = transformerFactory.newTransformer(new StreamSource(xsltFile));
Q1. I just wanted to know whats wrong in using a DOMSource in creating a Transformer.
Q2. Is this one of the ideal ways to substitute values for placeholders in an xml document? If my placeholders were in a source xml document is there any (straightforward) way to substitute them using style sheets (and passing parameters)?
Q1: This is a namespace awareness problem. You need to make the DocumentBuilderFactory namespace aware:
factory.setNamespaceAware(true);
Q2: There are several ways to get the values from an external xml file. One way to do this is with the document function and a top level variable in the document:
<!-- Loads a map relative to the template. -->
<xsl:variable name="map" select="document('map.xml')"/>
Then you can select the values out of the map. For instance, if map.xml was defined as:
<?xml version="1.0" encoding="UTF-8"?>
<mappings>
<mapping key="value1">value2</mapping>
</mappings>
You could remove the second parameter from your template, then look up the value using this line:
<tagg param1="{$param1}"><xsl:value-of select="$map/mappings/mapping[#key=$param1]"/></tagg>
Be aware that using relative document URIs will require that the stylesheet has a system id specified, so you will need to update the way you create your DOMSource:
DOMSource source = new DOMSource();
source.setNode(stylesheet);
source.setSystemId(xsltFile.toURL().toString());
In general, I suggest looking at all of the options that are available in Java's XML APIs. Assume that all of the features available are set wrong for what you are trying to do. I also suggest reading the XML Information Set. That specification will give you all of the definitions that the API authors are using.