Transforming one XML document into another with C++ - c++

What would be a straightforward way to transform a source XML document into a destination XML document. There are only small differences between source and destination: Specifically I want to delete the first UnitIDRecord-Node within each UnitIDGroup-Node.
What would be the appropriate model for this task DOM or SAX?
What XML-library would best fit this problem (which guarantees that the source and destination only differs in the deleted nodes, no missing namespace, attributes, encoding, ...)?
I read about XSLT, could this be an option?
The XML document looks like following:
<?xml version="1.0" encoding="UTF-8"?>
<ExPostInformationRealGeneration xmlns="http://schemas.seven2one.de/EEX/TransparencyPlatform" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schemas.seven2one.de/EEX/TransparencyPlatform EEXTransparencyPlatform.xsd">
<DispatcherID>XYZ</DispatcherID>
<CreationDateTime>2012-05-22T13:57:00Z</CreationDateTime>
<MessageText>1 - Positiv - Meldung mit Quality-Tag - L000</MessageText>
<UnitIDGroup>
<UnitID>E110200-001</UnitID>
<UnitIDRecord><Quantity>16.9</Quantity><Starttime>2008-04-30T22:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>16.6</Quantity><Starttime>2008-04-30T23:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>16.4</Quantity><Starttime>2008-05-01T00:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
</UnitIDGroup>
<UnitIDGroup>
<UnitID>E110200-002</UnitID>
<UnitIDRecord><Quantity>16.9</Quantity><Starttime>2008-04-30T22:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>16.6</Quantity><Starttime>2008-04-30T23:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>16.4</Quantity><Starttime>2008-05-01T00:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
</UnitIDGroup>
<UnitIDGroup>
<UnitID>E110201-001</UnitID>
<UnitIDRecord><Quantity>7.0</Quantity><Starttime>2008-04-30T22:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>7.1</Quantity><Starttime>2008-04-30T23:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>7.1</Quantity><Starttime>2008-05-01T00:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
</UnitIDGroup>
<!-- other UnitIDGroup elements -->
</ExPostInformationRealGeneration>

I would consider the possibility of reading the file in as strings and writing the string out to another file if it matches your criteria. That's a 5 line program and avoids any parsing etc. It will run quickly and is simple. But, it is specific to this problem and not reusable. I offer this therefore as a suggestion not the correct solution!

Related

Eliminate javascript from HTML with XSLT

I am trying to transform an HTML report into XML, but some javascript in the file is throwing errors, due to statements with a less-than character (e.g., for(var i=0; i<els.length;i++) ). I thought I could eliminate the javascript with the following template, which should remove entire script nodes:
<xsl:template match="script"/>
I assumed the XSLT processor would simply skip over the entire script nodes, but it's still throwing the same errors. I also tried adding this one:
<xsl:template match="script/text()"/>
No luck. If I manually remove all the javascript from the file, my transform works, but that's not practical as I need to create and run a daily automated process on these HTML files to extract some data in the HTML tables.
As a general rule, XSLT will only process well-formed XML input: it's not designed to process other formats like HTML.
However, XSLT will generally accept input from a parser that delivers a stream of events that looks sufficiently like an XML stream. This allows parsers like TagSoup and validator.nu to be used as a front-end to your XSLT processor.
Saxon packages this up with a parse-html() function that invokes TagSoup to parse HTML input and turn it into a DOM-like tree (actually an XDM tree) that it can process as if it came from XML.
validator.nu is a more up-to-date HTML parser than TagSoup, but you would have to do a little more work to integrate that.
Question was answered by Martin Honnen in the comments:
oxygenxml.com/doc/versions/18.1/ug-editor/tasks/… suggests there is an HTML import feature so try whether that helps. Of course there are standalone applications like HTML Tidy I think you can use outside of the XSLT processsing to first convert your HTML to XHTML.

RegEx to remove specific XML elements

I'm using Kate to process text to create an XML file but I've hit a roadblock. The text now contains additional data that I need to remove based on its content.
To be specific, I have an XML element called <officers> that contains 0 or more <officer> elements, which contain further elements such as <title>, <name>, etc.. While I probably could exclude these at run time using XSL, the file also drives another process that I don't want to touch - it's a general purpose data importer for Scribus so I don't want to touch the coding.
What I want to do is remove an <officer> element if the <title> content isn't what I want. For example, I don't want the First VP, so I'd like to remove:
<officer>
<title>First VP</title>
<incumbent>Joe Somebody</incumbent>
<address>....</address>
<address>....</address>
......
</officer>
I don't know how many lines will be in any <officer> element nor what positions they will in within the <officers> element.
The easy part it getting to the start of the content I want removed. The hard part is getting to the </officer> end tag. All the solutions I've found so far just result in Kate deciding that the RegEx is invalid.
Any suggestions are appreciated.
Regex is the wrong tool for this job; never process XML without a proper parser, except possibly for a one-off job on a single document where you will throw the code away after running it and checking the results by hand. You might find a regex that works on one sample document, but you'll never get it to work properly on a well-designed set of 100 test documents.
And it's easily done using XSLT. It's a stylesheet with two template rules: a default "identity template" rule to copy elements unchanged, and a second rule to delete the elements you don't want. In fact in XSLT 3.0 it gets even simpler:
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="officer[title='First VP']"/>

I want to try a tag not closed with xpath

I want to try a tag not closed with xpath like this:
<figure class="img"><img class="immagine-in-linea-senza-cornice" width="16%" src="images/schema_1_fmt.jpeg" alt=""/>
I want to close the tag with a xslt transformation.
XPath does not work directly on the input document, but on an abstract, tree-like representation of the document (e.g. XDM or DOM). In this model, opening and closing tags of an element are not represented at all. Instead, an element appears as a single entity in the tree.
Therefore, manipulating < and /> is completely out of the question for languages like XPath, because the concept of opening and closing tags is simply not implemented. I would argue that this abstraction is an advantage of the models, though.
Also, XSLT transformations normally take as input XML documents. If your document has unclosed elements, it will be rejected by any application that is only prepared to process XML.
In short, fix the XML document with a language other than the combination of XSLT and XPath (see e.g. here), and think about XSLT as soon as you have well-formed XML as input.

Adding/removing specific elements from xml file, in Qt?

I have a XML Document, like this:
<?xml version="1.0" encoding="UTF-8"?>
<items>
<item s_no="1">
<title>title_1</title>
<path>path1</path>
<desc>descriptoion1</desc>
</item>
<item s_no="2">
<title>title_2</title>
<path>path2</path>
<desc>descriptoion2</desc>
</item>
This is generated from QXmlStreamWriter in Qt. I want a function to add <item> tag with all elements like <title>, <path> etc. and I want a function to remove an item tag by identifying s_no attributes. All this should be done, without affecting any other content in the file.
I've searched a lot,I know there are similar questions, I've tried some code but it didn't worked. Are there any functions that do this, in QDomDocument?
When I have looked into doing this in the past, it hasn't really been a trivial thing.
QDomDocument and QDomNode
I think you should be able to do it with QDomDocument and QDomNode. Sometimes it is hard to see all the possible functions just on the main page for the documentation of the class, because it can get so much from the abstract classes it is derived from... clicking "lists of all members" shows a complete list.
http://doc.qt.io/qt-5/qdomdocument-members.html
Some calls that look promising include: childNodes elementById elementsByTagName createNode insertBefore insertAfter removeChild.
UPDATE: A working example that shows a straight forward way how to delete and insert nodes on a QDomDocument.
https://github.com/peteristhegreat/xml_insert_remove
Note, that when adding QDomNodes/QDomElements, etc, every element needs to be created on the document, otherwise it doesn't stay in scope when you leave a function.
QXmlStreamReader and QXmlStreamWriter
A few documents I've seen (a few years ago) said that they highly recommend using the QXmlStream* classes since they are better supported, or have been maintained more recently. I think it has some better error handling and doesn't have to load the whole document to be useful.
So as far as editing the document and resaving it, the most direct way that I know of is to read in everything, and store it as nested C++ classes and then write them out.
QJson Example (similar to QXmlStream*
There is a similar example with Json, that really shows off the power of subclassing a read and a write function into your model.
http://doc.qt.io/qt-5/qtcore-json-savegame-example.html
I think a similar approach could be done with the stream reader and writer class for XML.
Hope that helps.

XSLT: How do I trigger a template when there is no input file?

I'm creating a template which produces output based on a single string, passed via parameter, and does not use an input XML document. xsltproc seems to happily run with a single parameter specifying the stylesheet, but I don't see a way to trigger a template without an input file (no parameter to xsltproc to run a named template, for example).
I'd like to be able to run:
xsltproc --stringparam bar baz foo.xsl
But I'm currently having to run, with the "main" template matching "/":
echo '<xml/>' | xsltproc --stringparam bar baz foo.xsl -
How can I get this to work? I'm sure I've seen other templates in the past which were meant to be run without an input document, but I don't remember how they worked or where to find them again. :-)
Actually, this has been done quite often.
In XSLT 2.0 it is defined in the Spec. that providing an initial context node is optional.
If no initial context node is provided (no source XML document), then it is important to provide the name of a named template which is to be executed as the entry point to the transformation.
In XSLT 1.0 one can provide to the transformation its own primary stylesheet module (file) as the source XML document, and of course, the transformation can completely ignore this source XML document. This technique has long ago been demonstrated and used by Jeni Tennison.
For example:
<?xml-stylesheet type="text/xsl" href="example.xml"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<p>Hello, world!</p>
</xsl:template>
</xsl:stylesheet>
When the above code is saved in a file named "example.xml" and then the folder contents is displayed with Windows Explorer, double-clicking on the file "example.xml" will open IE and produce:
Hello, world!
In general, you cannot do this with XSLT - specification requires there to be an input document, and for the processing to start with applying any available templates to its root node. Some XSLT processors might give a way to do what you want (e.g. execute a named template) as an extension, but I don't know any such, and it doesn't seem that xsltproc is one of them, judging from its man page.
In fact, this sounds pretty dubious in general, as the purpose of using XSLT to produce some output from a plain string input is unclear - it's not the kind of task it's generally good at.