Format XML file in c++ or Qt - c++

I have an XML file where outputs are not getting formatted. That means all the outputs are in a single line but I want to break it tag by tag.
For e.g. -
<?xml version="1.0" encoding="UTF-8" standalone="no" ?><Analyser> <JointDetails> <Details><StdThickness> T </StdThickness><Thickness_num> 0.032 </Thickness_num></Details> </JointDetails></Analyser>
But i want to do it like this ::
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<Analyser>
<JointDetails>
<Details>
<StdThickness> T </StdThickness>
<Thickness_num> 0.032 </Thickness_num>
</Details>
</JointDetails>
</Analyser>
Please don't suggest to do it while writing the XML file because this XML file is already there but now I have to format it as mentioned above.

Using a QXmlStreamReader and QXmlStreamWriter should do what you want. QXmlStreamWriter::setAutoFormatting(true) will format the XML on different lines and use the correct indentation. With QXmlStreamReader::isWhitespace() you can filter out superfluous whitespace between tags.
QString xmlIn = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\" ?>"
"<Analyser><JointDetails> <Details><StdThickness>"
" T </StdThickness><Thickness_num> 0.032 </Thickness_num>"
"</Details> </JointDetails></Analyser>";
QString xmlOut;
QXmlStreamReader reader(xmlIn);
QXmlStreamWriter writer(&xmlOut);
writer.setAutoFormatting(true);
while (!reader.atEnd()) {
reader.readNext();
if (!reader.isWhitespace()) {
writer.writeCurrentToken(reader);
}
}
qDebug() << xmlOut;

If you're using Qt, you can read it with QXmlStreamReader and write it with QXmlStreamWriter, or parse it as QDomDocument and convert that back to QString. Both QXmlStreamWriter and QDomDocument support formatting.

void format(void)
{
QDomDocument input;
QFile inFile("D:/input.xml");
QFile outFile("D:/output.xml");
inFile.open(inFile.Text | inFile.ReadOnly);
outFile.open(outFile.Text | outFile.WriteOnly);
input.setContent(&inFile);
QDomDocument output(input);
QTextStream stream(&outFile);
output.save(stream, 2);
}

If you want a simple robust solution that does not rely on QT, you can use libxml2. (If you are using QT anyway, just use what Frank Osterfeld said.)
xmlDoc* xdoc = xmlReadFile(BAD_CAST"myfile.xml", NULL, NULL, 0);
xmlSaveFormatFile(BAD_CAST"myfilef.xml", xdoc, 1);
xmlFreeDoc(xdoc);
Can I interest you in my C++ wrapper of libxml2?
Edit:
If you happen to have the XML string in memory, you may also use xmlReadDoc... But it doesn't stop there.

Utilising C++ you can add a single character between each instance of >< for output:
by changing >< to >\n< (this adds the non-printing character for a newline) each tag will print onto a new line. There are API ways to do this however as mentioned above, but for a simple way to do what you suggest for console output, or so that the XML flows onto new lines per tag in something like a text editor, the \n should work fine.
If you need a more elegant output, you can code a method yourself using \n (newline) and \t (tab) to lay out your output, or utilise an api if you reeqire a more elaborate representation.

Related

Add XML contained in string as XML nodes to existing pugixml tree

I have a configuration file saver/loader. In addition to the expected data, there is a <CustomData> node. When saving the node, we'd simply have a std::string _customData and add it to the node, like this:
pugi::xml_document doc;
pugi::xml_node config = doc.append_child("OurConfig");
// save custom data
pugi::xml_node customData = config.append_child("CustomData");
customData.append_child(pugi::node_pcdata).set_value(_customData);
Our _customData was base64 encoded XML. It is provided from another part of the application. It must be a string, since the other part of the application uses different programming language (C#). As you can imagine, that became annoying, because it wasn't human readable. First step to fix this was simply to get rid of base64 in the app that provides _customData. So now we have readable version, which looks like this:
<?xml version="1.0"?>
<OurConfig>
<CustomData><CfgRoot>
<SomeValue name="External setting for foo" value="Foo"/>
<SomeValue name="External setting for bar" value="Bar"/>
</CfgRoot></CustomData>
</OurConfig>
But it could probably improve if the custom data was directly appended to XML tree instead of as string value. How can I append XML string as XML and not as string to pugixml tree?
Ie. the output I'd like:
<?xml version="1.0"?>
<OurConfig>
<CustomData>
<CfgRoot>
<SomeValue name="External setting for foo" value="Foo"/>
<SomeValue name="External setting for bar" value="Bar"/>
</CfgRoot>
</CustomData>
</OurConfig>
In the docs, there are three methods listed. I used the first one, making a convenience function like this:
bool AppendXMLString(pugi::xml_node target, const std::string& srcString)
{
// parse XML string as document
pugi::xml_document doc;
if (!doc.load_buffer(srcString.c_str(), srcString.length()))
return false;
for (pugi::xml_node child = doc.first_child(); child; child = child.next_sibling())
target.append_copy(child);
return true;
}

How to solve the error FODC0002 when using QXmlFormatter?

I'm trying to use QXmlQuery to get some elements from a XML file. Everything works fine (I'm able to validate the source XML file and etc) until I get to the part in which I try to use QXmlFormatter, in order to write the results to another XML file. When I get to this part, the following error is shown: Error FODC0002 in tag:trolltech.com,2007:QtXmlPatterns:QIODeviceVariable:inputDocument, at line 1, column 0: Premature end of document.
The code is based on the "Recipes" project available as an example in Qt. The only difference here is that I made a simpler version of the "cookbook" XML file. I've tried to use QBuffer(the approach implemented in the example) instead of a file, but as expected, got the same result.
Here is the source XML, called temp2_xml.xml
<?xml version="1.0" encoding="UTF-8"?>
<cookbook>
<recipe>
<title>Quick and Easy Mushroom Soup</title>
<title>Cheese on Toast</title>
</recipe>
</cookbook>
Here is the Xquery file, called allRecipes.xq:
(: Select all recipes. :)
declare variable $inputDocument external;
doc($inputDocument)/cookbook/recipe/<p>{string(title)}</p>
And here's the code:
QFile aqr_xq("C:/test_xml/allRecipes.xq");
aqr_xq.open(QIODevice::ReadOnly);
QFile file("C:/test_xml/temp_xml.xml");
file.open(QIODevice::ReadWrite);
QFile aqr_r;
aqr_r.setFileName("C:/test_xml/temp2_xml.xml");
aqr_r.open(QIODevice::ReadOnly);
QTextStream in(&aqr_r);
QString inputDocument = in.readAll();
const QString str_query(QString::fromLatin1(aqr_xq.readAll()));
QXmlQuery query;
query.bindVariable("inputDocument",&aqr_r);
query.setQuery(str_query);
bool debug_xml = false;
debug_xml = query.isValid();
QXmlFormatter ser(query, &file);
query.evaluateTo(&ser);
Any ideas about what's causing the problem and how to solve it?
I think the problem is indeed the use of the text stream to consume the opened file, if I don't use that and simply use the code
QFile aqr_xq(queryFile);
aqr_xq.open(QIODevice::ReadOnly);
QFile file(outputFile);
file.open(QIODevice::ReadWrite);
QFile aqr_r;
aqr_r.setFileName(inputFile);
aqr_r.open(QIODevice::ReadOnly);
const QString str_query(QString::fromLatin1(aqr_xq.readAll()));
QXmlQuery query;
query.bindVariable("inputDocument",&aqr_r);
query.setQuery(str_query);
bool debug_xml = false;
debug_xml = query.isValid();
QXmlFormatter ser(query, &file);
query.evaluateTo(&ser);
then indeed the error is in the XQuery and is raised as
Error XPTY0004: Required cardinality is zero or one("?"); got cardinality one or more("+").
You haven't said which output you want to create but if you I change the XQuery to e.g.
declare variable $inputDocument external;
doc($inputDocument)/cookbook/recipe/title/<p>{string()}</p>
then the C++ code runs fine.
Note also that you can load the XQuery directly from a file by using
query.setQuery(QUrl(queryFile));

Tinyxml2 append function

I have been looking for a way to append my xml file using tinyxml2 but couldn't find anything. I would appreciate any help.
Here is my code:
function savedata() {
XMLNode * pRoot = xmlDoc.NewElement("Cars");
xmlDoc.InsertFirstChild(pRoot);
XMLElement * pElement = xmlDoc.NewElement("Brand");
pElement->SetText("Audi");
pRoot->InsertEndChild(pElement);
pElement = xmlDoc.NewElement("type");
pElement->SetText("4x4");
pRoot->InsertEndChild(pElement);
pElement = xmlDoc.NewElement("Date");
pElement->SetAttribute("day", 26);
pElement->SetAttribute("month", "April");
pElement->SetAttribute("Year", 2015);
pElement->SetAttribute("dateFormat", "26/04/2015");
pRoot->InsertEndChild(pElement);
XMLError eResult = xmlDoc.SaveFile("SavedData1.xml");
XMLCheckResult(eResult);
}
Everytime I run the function, the xml is overwritten and I want to append to the existing file.
My xml file:
<Cars>
<Brand>Audi</Brand>
<Whatever>anothercrap</Whatever>
<Date day="26" month="April" Year="2015" dateFormat="26/04/2015"/>
</Cars>
My root is and I want to append to the existing file. For example,
<Cars>
<Brand>Audi</Brand>
<type>4x4</type>
<Date day="26" month="April" Year="2015" dateFormat="26/04/2015"/>
<Brand>BMWM</Brand>
<type>truck</type>
<Date day="26" month="April" Year="2015" dateFormat="26/04/2015"/>
</Cars>
XML is structured data so a textual append would be tricky and possibly error-prone, as you would have to make sure you don't add the root node twice, and that you maintain indentation etc.
What might be easier is to load the XML, parse it with TinyXML, and write it back.
You can append if you use the FILE overload for xmldoc.Save.
FILE* file = fopen("myfile.xml","a");
xmlDoc.Save(file);
fclose(file);
You just have to be careful when doing this since it will mess up the doc if you're printing multiple root nodes. If you're doing this for logging purposes I would just leave out the root node entirely and just have whatever is reading the log back know to append them or just not even care about proper xml format.

Generic solution for removing xml declararation using perl

Hi i want remove the declaration in my xml file and problem is declaration is sometimes embed with the root element.
XML looks as follows
Case1:
<?xml version="1.0" encoding="UTF-8"?> <document> This is a document root
<child>----</child>
</document>`
Case 2:
<?xml version="1.0" encoding="UTF-8"?>
<document> This is a document root
<child>----</child>
</document>`
Function should also work for the case when root node is in next line.
My function works only for case 2..
sub getXMLData {
my ($xml) = #_;
my #data = ();
open(FILE,"<$xml");
while(<FILE>) {
chomp;
if(/\<\?xml\sversion/) {next;}
push(#data, $_);
}
close(FILE);
return join("\n",#data);
}
*** Please note that encoding is not constant always.
OK, so the problem here is - you're trying to parse XML line based, and that DOESN'T WORK. You should avoid doing it, because it makes brittle code, which will one day break - as you've noted - thanks to perfectly valid changes to the source XML. Both your documents are semantically identical, so the fact your code handles one and not the other is an example of exactly why doing XML this way is a bad idea.
More importantly though - why are you trying to remove the XML declaration from your XML? What are you trying to accomplish?
Generically reformatting XML can be done like this:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig->new(
pretty_print => 'indented',
);
$twig->parsefile('your_xml_file');
$twig->print;
This will parse your XML and reformat it in one of the valid ways XML may be formatted. However I would strongly urge you not to just discard your XML declaration, and instead carry on with something like XML::Twig to process it. (Open a new question with what you're trying to accomplish, and I'll happily give you a solution that doesn't trip up with different valid formats of XML).
When it comes to merging XML documents, XML::Twig can do this too - and still check and validate your XML as it goes.
So you might do something like (extending from the above):
foreach my $file ( #file_list ) {
my $child = XML::Twig -> new ();
$child -> parsefile ( $xml_file );
my $child_doc = $child -> root -> cut;
$child_doc -> paste ( $twig -> root );
}
$twig -> print;
Exactly what you'd need to do, depends a little on your desired output structure - you'd need 'wrap' in the root element anyway. Open a new question with some sample input and desired output, and I'll happily take a crack at it.
As an example - if you feed the above your sample input twice, you get:
<?xml version="1.0" encoding="UTF-8"?>
<document><document> This is a document root
<child>----</child></document> This is a document root
<child>----</child></document>
Which I know isn't likely to be what you want, but hopefully illustrates a parser based way of XML restructuring.

How to replace text in content control after, XML binding using docx4j

I am using docx4j 2.8.1 with Content Controls in my .docx file. I can replace the CustomXML part by injecting my own XML and then calling BindingHandler.applyBindings after supplying the input XML. I can add a token in my XML such as ¶ then I would like to replace that token in the MainDocumentPart, but using that approach, when I iterate through the content in the MainDocumentPart with this (link) method none of my text from my XML is even in the collection extracted from the MainDocumentPart. I am thinking that even after binding the XML, it remains separate from the MainDocumentPart (??)
I haven't tried this with anything more than a little test doc yet. My token is the Pilcrow: ¶. Since it's a single character, it won't be split in separate runs. My code is:
private void injectXml (WordprocessingMLPackage wordMLPackage) throws JAXBException {
MainDocumentPart part = wordMLPackage.getMainDocumentPart();
String xml = XmlUtils.marshaltoString(part.getJaxbElement(), true);
xml = xml.replaceAll("¶", "</w:t><w:br/><w:t>");
Object obj = XmlUtils.unmarshalString(xml);
part.setJaxbElement((Document) obj);
}
The pilcrow character comes from the XML and is injected by applying the XML bindings to the content controls. The problem is that the content from the XML does not seem to be in the MainDocumentPart so the replace doesn't work.
(Using docx4j 2.8.1)