Adding/removing specific elements from xml file, in Qt? - c++

I have a XML Document, like this:
<?xml version="1.0" encoding="UTF-8"?>
<items>
<item s_no="1">
<title>title_1</title>
<path>path1</path>
<desc>descriptoion1</desc>
</item>
<item s_no="2">
<title>title_2</title>
<path>path2</path>
<desc>descriptoion2</desc>
</item>
This is generated from QXmlStreamWriter in Qt. I want a function to add <item> tag with all elements like <title>, <path> etc. and I want a function to remove an item tag by identifying s_no attributes. All this should be done, without affecting any other content in the file.
I've searched a lot,I know there are similar questions, I've tried some code but it didn't worked. Are there any functions that do this, in QDomDocument?

When I have looked into doing this in the past, it hasn't really been a trivial thing.
QDomDocument and QDomNode
I think you should be able to do it with QDomDocument and QDomNode. Sometimes it is hard to see all the possible functions just on the main page for the documentation of the class, because it can get so much from the abstract classes it is derived from... clicking "lists of all members" shows a complete list.
http://doc.qt.io/qt-5/qdomdocument-members.html
Some calls that look promising include: childNodes elementById elementsByTagName createNode insertBefore insertAfter removeChild.
UPDATE: A working example that shows a straight forward way how to delete and insert nodes on a QDomDocument.
https://github.com/peteristhegreat/xml_insert_remove
Note, that when adding QDomNodes/QDomElements, etc, every element needs to be created on the document, otherwise it doesn't stay in scope when you leave a function.
QXmlStreamReader and QXmlStreamWriter
A few documents I've seen (a few years ago) said that they highly recommend using the QXmlStream* classes since they are better supported, or have been maintained more recently. I think it has some better error handling and doesn't have to load the whole document to be useful.
So as far as editing the document and resaving it, the most direct way that I know of is to read in everything, and store it as nested C++ classes and then write them out.
QJson Example (similar to QXmlStream*
There is a similar example with Json, that really shows off the power of subclassing a read and a write function into your model.
http://doc.qt.io/qt-5/qtcore-json-savegame-example.html
I think a similar approach could be done with the stream reader and writer class for XML.
Hope that helps.

Related

Hide a topic from PDF output at xsl level

I have a topic, which only contains some metadata (childs of prolog and some custom elements too) of the documentation. The contents of these elements is displayed in headers and footers in the acutal PDF output.
My problem: now the referred topic itself included in the pdf as an empty chapter.
Setting the processing-role to resource-only or filtering the topic does not solve the problem, as the content of the elements is needed in the further steps of the transformation (headers, footerst ect..)
My best guess is to somehow exclude this one topic and the needless page sequence based on its ID with..
.. adding some attributes in a custom xsl template?
.. modification of topic processing?
.. an obvious method that didn’t occur to me?
but I’m a beginner, so a little guidance would be nice.
Currently using:
DITA-OT 2.1; Oxygen 17.1; Bookmap spec.; XSL FO based transformations;
Thanks in advance!
Maybe instead of keeping that content inside the topic, you could keep it inside the main DITA Map, maybe using some DITA "data" elements like:
<map>
<title></title>
<topicmeta>
<data name="d1" value="v1"/>
</topicmeta>
Anyway if you plan to continue with having a separate topic, maybe you can set on that topic an "outputclass='filtered'" attribute and then use Oxygen's Find/Replace in Files to search in the folder "DITA-OT/plugins/org.dita.pdf2" for "bookmap/chapter". You probably need to find the XSLT templates which process DITA "chapter" elements for the table of contents, bookmarks area and for the main document and add a [not(#outputclass='hidden')] condition to them so that they skip that topic.

RegEx to remove specific XML elements

I'm using Kate to process text to create an XML file but I've hit a roadblock. The text now contains additional data that I need to remove based on its content.
To be specific, I have an XML element called <officers> that contains 0 or more <officer> elements, which contain further elements such as <title>, <name>, etc.. While I probably could exclude these at run time using XSL, the file also drives another process that I don't want to touch - it's a general purpose data importer for Scribus so I don't want to touch the coding.
What I want to do is remove an <officer> element if the <title> content isn't what I want. For example, I don't want the First VP, so I'd like to remove:
<officer>
<title>First VP</title>
<incumbent>Joe Somebody</incumbent>
<address>....</address>
<address>....</address>
......
</officer>
I don't know how many lines will be in any <officer> element nor what positions they will in within the <officers> element.
The easy part it getting to the start of the content I want removed. The hard part is getting to the </officer> end tag. All the solutions I've found so far just result in Kate deciding that the RegEx is invalid.
Any suggestions are appreciated.
Regex is the wrong tool for this job; never process XML without a proper parser, except possibly for a one-off job on a single document where you will throw the code away after running it and checking the results by hand. You might find a regex that works on one sample document, but you'll never get it to work properly on a well-designed set of 100 test documents.
And it's easily done using XSLT. It's a stylesheet with two template rules: a default "identity template" rule to copy elements unchanged, and a second rule to delete the elements you don't want. In fact in XSLT 3.0 it gets even simpler:
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="officer[title='First VP']"/>

Architecture for a c++ XML-parser with a HTML-reportgenerator

I want a program that parses a XML-file, build a structure with the tags I need and finally print a HTML-report using HTML-templates with keywords that get replaced by the data from the XML files.
Since I'm not(yet) really into the OO programming I hoped to get some tips and advices how to structure a program like this.
I thought that two classes should be enough. A parser class and a data class.
the first one to go through the XML-file and report every tag I want to store to a data object which stores all the tags in a hierarchical order. After that I want to call a print function which prints everything as HTML-report.
I'm not sure how to report the tags to the data object
Could I store the tags in one object which stores a tree of structs or would it be better to store each tag in a separate object?
Any help would be greatly appreciated!
You don't mention Qt in your question, but as you added it as a tag: there is QtXML, which will give a way to parse and generate XML documents, and will also work for HTML output. XML is typically handled either via DOM or SAX. With DOM, the documents are parsed into a tree structure, and you will work on the tree as your central data element. With SAX, you use callback functions that are called for the different XML elements while parsing the XML input.
There is a lot about DOM and SAX on the internet, Wikipedia is a good starting point. There is also a lot of documentation on QtXML on-line.
Using DOM and/or SAX will give a nice architecture for solving the problem.
I solved my problem and want to share my architecture.
I made a Class Parser to parse the Elements and report the tags to an HTMLHandler class which has Subclasses like Header, Content and Sub-content. which store the Data and all have write()- methodes to print themselves out.
works fine for me and is quit simple :)

Transforming one XML document into another with C++

What would be a straightforward way to transform a source XML document into a destination XML document. There are only small differences between source and destination: Specifically I want to delete the first UnitIDRecord-Node within each UnitIDGroup-Node.
What would be the appropriate model for this task DOM or SAX?
What XML-library would best fit this problem (which guarantees that the source and destination only differs in the deleted nodes, no missing namespace, attributes, encoding, ...)?
I read about XSLT, could this be an option?
The XML document looks like following:
<?xml version="1.0" encoding="UTF-8"?>
<ExPostInformationRealGeneration xmlns="http://schemas.seven2one.de/EEX/TransparencyPlatform" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schemas.seven2one.de/EEX/TransparencyPlatform EEXTransparencyPlatform.xsd">
<DispatcherID>XYZ</DispatcherID>
<CreationDateTime>2012-05-22T13:57:00Z</CreationDateTime>
<MessageText>1 - Positiv - Meldung mit Quality-Tag - L000</MessageText>
<UnitIDGroup>
<UnitID>E110200-001</UnitID>
<UnitIDRecord><Quantity>16.9</Quantity><Starttime>2008-04-30T22:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>16.6</Quantity><Starttime>2008-04-30T23:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>16.4</Quantity><Starttime>2008-05-01T00:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
</UnitIDGroup>
<UnitIDGroup>
<UnitID>E110200-002</UnitID>
<UnitIDRecord><Quantity>16.9</Quantity><Starttime>2008-04-30T22:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>16.6</Quantity><Starttime>2008-04-30T23:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>16.4</Quantity><Starttime>2008-05-01T00:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
</UnitIDGroup>
<UnitIDGroup>
<UnitID>E110201-001</UnitID>
<UnitIDRecord><Quantity>7.0</Quantity><Starttime>2008-04-30T22:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>7.1</Quantity><Starttime>2008-04-30T23:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
<UnitIDRecord><Quantity>7.1</Quantity><Starttime>2008-05-01T00:00:00Z</Starttime><Period>PT1H</Period><MessageText></MessageText></UnitIDRecord>
</UnitIDGroup>
<!-- other UnitIDGroup elements -->
</ExPostInformationRealGeneration>
I would consider the possibility of reading the file in as strings and writing the string out to another file if it matches your criteria. That's a 5 line program and avoids any parsing etc. It will run quickly and is simple. But, it is specific to this problem and not reusable. I offer this therefore as a suggestion not the correct solution!

web service pattern that supports lazy loading of all properties

I am trying to design an endpoint template for a web service. My main requirement is that the caller is able to specify which properties should be populated in the returned result set.
My service returns large lists (up to 1M records) of partial objects as well as individual full objects such as (rough example XML, sorry it's a little verbose)
List:
<items>
<item>
<a>aaa</a>
<b>bbb</b>
</item>
<item>
<a>aaaA</a>
<b>bbbB</b>
</item>
</items>
Detail:
<item>
<a>aaa</a>
<b>bbb</b>
<c>ccc</c>
...
<w>
<x>xxx</x>
<y>yyy</y>
</w>
<z>zzz</z>
</item>
I have considered the following ideas:
Returning the full detail items in the list
Creating a 'list' item type that is shorter
passing a string array of property names that the caller wants to be returned
I am leaning towards the 3rd option but I want something different to that it doesn't support sub objects, I have considered passing the xml schema that you want returned instead of an array.
I would like the API to support lazy loading which is why the 3rd way seems viable as well.
Here's an example of what a function for 3. would look like:
public User GetUser(long ID, string[] properties)
And then the caller could just go:
User.Email = GetUser(User.ID, "Email").Email
Through extensive use of default values and hiding nulls, the returned XML for that would be:
<User>
<ID>123</ID>
<Email>example#example.com</Email>
</User>
Now the problem as mentioned above is trying to make it play nice with things like <w> far above, which itself has sub items as well as the possibility for lists to have sub items.
As I have far too many properties, I cannot have just a ws method for each property.
I am considering option 3. but using an xml schema instead of a string[].. But I can't think of an easy way to define this, I would also like to not have to use String names for properties such as "Email".
The final plan is to have a series of pre-defined schemas that are used commonly and only in advanced cases would we need to actually define the requested properties. But I have no idea of all the systems that will be talking to my API, let alone what properties they might each want (it's not going to be feasible for us to tailor the API for every caller).
Or am I over complicating everything too much?
I found the documentation for the Google APIs on Partial Responses and Partial Updates:
http://googlecode.blogspot.com/2011/07/lightning-fast-performance-tips-for.html
This seems to answer my question.