Pretty printing XML in wxWidgets - c++

I'm writing a class derived from wxStyledTextCtrl and I want it to prettify given XML without adding anything other than whitespaces. I cannot find simple working solution. I can only use wxStyledTextCtrl, wxXmlDocument and libxml2.
The result I'm aiming for is that after calling SetText with wxString containing following text
<!-- comment1 --> <!-- comment2 --> <node><emptynode/> <othernode>value</othernode></node>
the control should show
<!-- comment1 -->
<!-- comment2 -->
<node>
<emptynode/>
<othernode>value</othernode>
</node>
using libxml2 I managed to almost achieve this, but it also prints XML declaration (eg. <?xml version="1.0" encoding="UTF-8"?>) and I don't want this.
inb4, I'm looking for simple and clean solution - i don't want to manually remove first line of formatted XML
Is there any simple solution to this using given tools? I feel like I'm missing something.

Is there a simple solution? No. But if you want to write you're own pretty print function, you basically need to make a depth first iteration over the xml document tree, printing it as you go. There's a slight complication in that you also need some way of knowing when to close a tag.
Here's an incomplete example of one way to do this using only wxWidgets xml classes. Currently, it doesn't handle attributes, self closing elements (such as '' in your sample text), or any other special element types. A complete pretty printer would need to add those things.
#include <stack>
#include <set>
#include <wx/xml/xml.h>
#include <wx/sstream.h>
wxString PrettyPrint(const wxString& in)
{
wxStringInputStream string_stream(in);
wxXmlDocument doc(string_stream);
wxString pretty_print;
if (doc.IsOk())
{
std::stack<wxXmlNode*> nodes_in_progress;
std::set<wxXmlNode*> visited_nodes;
nodes_in_progress.push(doc.GetDocumentNode());
while (!nodes_in_progress.empty())
{
wxXmlNode* cur_node = nodes_in_progress.top();
nodes_in_progress.pop();
int depth = cur_node->GetDepth();
for (int i=1;i<depth;++i)
{
pretty_print << "\t";
}
if (visited_nodes.find(cur_node)!=visited_nodes.end())
{
pretty_print << "</" << cur_node->GetName() << ">\n";
}
else if ( !cur_node->GetNodeContent().IsEmpty() )
{
//If the node has content, just print it now
pretty_print << "<" << cur_node->GetName() << ">";
pretty_print << cur_node->GetNodeContent() ;
pretty_print << "</" << cur_node->GetName() << ">\n";
}
else if (cur_node==doc.GetDocumentNode())
{
std::stack<wxXmlNode *> nodes_to_add;
wxXmlNode *child = cur_node->GetChildren();
while (child)
{
nodes_to_add.push(child);
child = child->GetNext();
}
while (!nodes_to_add.empty())
{
nodes_in_progress.push(nodes_to_add.top());
nodes_to_add.pop();
}
}
else if (cur_node->GetType()==wxXML_COMMENT_NODE)
{
pretty_print << "<!-- " << cur_node->GetContent() << " -->\n";
}
//insert checks for other types of nodes with special
//printing requirements here
else
{
//otherwise, mark the node as visited and then put it back
visited_nodes.insert(cur_node);
nodes_in_progress.push(cur_node);
//If we push the children in order, they'll be popped
//in reverse order.
std::stack<wxXmlNode *> nodes_to_add;
wxXmlNode *child = cur_node->GetChildren();
while (child)
{
nodes_to_add.push(child);
child = child->GetNext();
}
while (!nodes_to_add.empty())
{
nodes_in_progress.push(nodes_to_add.top());
nodes_to_add.pop();
}
pretty_print <<"<" << cur_node->GetName() << ">\n";
}
}
}
return pretty_print;
}

Related

Can you change a bsoncxx object (document/value/element)?

I'm using the mongocxx driver and I am considering keeping the query results given in BSON as a data holder in a couple of objects instead of parsing the BSON to retrieve the values and then discard it.
This would make some sense "if" I can edit the BSON on the fly. I couldn't find anything in the bsoncxx driver documentation besides the builder that would allow me to manipulate a bsoncxx document/value/view/element after it's been constructed.
As an example, imagine that I have something like this
fruit["orange"];
where fruit is a bsoncxx::document::element
I can get the value by using one of the .get_xxx operators.
What I can't find is something like
fruit["orange"] = "ripe";
Is there a way of doing this, or the idea behind the builder is "just" to create a query to give to the database?
There was a question with same theme, see here
So, bsoncxx objects seem to be immutable, and we have to re-create them if we need to edit them.. :(
I've written a really bad solution which re-creates document from scratch
But this is a solution, I guess.
std::string bsoncxx_string_viewToString(core::v1::string_view gotStringView) {
std::stringstream convertingStream;
convertingStream << gotStringView;
return std::move(convertingStream.str());
}
std::string b_utf8ToString(bsoncxx::types::b_utf8 gotB_utf8) {
return std::move(bsoncxx_string_viewToString(core::v1::string_view(gotB_utf8)));
}
template <typename T>
bsoncxx::document::value editBsoncxx(bsoncxx::document::view documentToEdit, std::string keyToEdit, T newValue, bool appendValueIfKeyNotExist = true) {
auto doc = bsoncxx::builder::stream::document{};
std::string currentKey;
for (auto i : documentToEdit) {
currentKey = bsoncxx_string_viewToString(i.key());
if (currentKey == keyToEdit) {
doc << keyToEdit << newValue;
appendValueIfKeyNotExist = false;
} else {
doc << currentKey << i.get_value();
}
}
if (appendValueIfKeyNotExist) // Maybe this would be better with documentToEdit.find(key), but I don't know how to check if iterator is past-the-end
//If there is a way to check if bsoncxx contains key, we can achieve ~o(log(n)) [depending on 'find key' implementation] which is better than o(n)
doc << keyToEdit << newValue;
return doc.extract();
}
Usage:
auto doc = document{} << "foo0" << "bar0" << "foo1" << 1 << "foo2" << 314 << finalize;
std::cout << bsoncxx::to_json(doc) << std::endl << std::endl;
doc = editBsoncxx<std::string> (doc.view(), "foo1", "edited"); //replace "foo1" with string "edited"
doc = editBsoncxx<int>(doc.view(), "baz_noappend", 123, false); //do nothing if key "baz_noappend" is not found. <- if key-existance algorythm will be applied, we'd spend about o(lob(n)) here, not o(n)
doc = editBsoncxx<int>(doc.view(), "baz_append", 123, true); //key will not be found => it'll be appended which is default behaviour
std::cout << bsoncxx::to_json(doc) << std::endl;
Result:
{
"foo0" : "bar0",
"foo1" : 1,
"foo2" : 314
}
{
"foo0" : "bar0",
"foo1" : "edited",
"foo2" : 314,
"baz_append" : 123
}
So, in your case you can use
fruit = editBsoncxx<std::string>(fruit.view(), "orange", "ripe");
But, again, see already-mentioned related question you're right when saying that
the idea behind the builder is "just" to create a query to give to the database?
I think, the solution will be "do not edit documents".
also you can write something like type-converter from bsoncxx to other json storing fomat (for example, rapidjson)
Beware of {value:"valid_json"}: bsoncxx::to_json does not add backslashes to quote signs in values => injection can be made.

Find Key in XML with Boost

I am using boost for the first time within an old code base that we have
iptree children = pt.get_child(fieldName);
for (const auto& kv : children) {
boost::property_tree::iptree subtree = (boost::property_tree::iptree) kv.second ;
//Recursive call
}
My problem is sometimes the fieldName doesn`t exist in the XML file and I have an exception
I tried :
boost::property_tree::iptree::assoc_iterator it = pt.find(fieldName);
but I dont know how to use the it I can`t use: if (it != null)
Any help please will be appreciated
I am using VS 2012
If it`s very complicated is there any other way to read a XML with nested nodes? I am working on that since 3 days
This is an Example of the XML
<?xml version="1.0" encoding="utf-8"?>
<nodeA xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<nodeA.1>This is the Adresse</nodeA.1>
<nodeA.2>
<node1>
<node1.1>
<node1.1.1>Female</node1.1.1>
<node1.1.2>23</node1.1.2>
<node1.1.3>Engineer</node1.1.3>
</node1.1>
<node1.2>
<node1.2.1>Female</node1.2.1>
<node1.2.2>35</node1.2.2>
<node1.2.3>Doctors</node1.2.3>
</node1.2>
</node1>
</nodeA.2>
<nodeA.3>Car 1</nodeA.3>
</nodeA>
Use pt.get_child_optional(...) to prevent an exception. pt.find(...) returns an iterator which compares true to pt.not_found() on failure.
EDIT: How to use boost::optional<--->
boost::optional< iptree & > chl = pt.get_child_optional(fieldname);
if(chl) {
for( auto a : *chl )
std::cerr << ":" << a.first << ":" << std::endl;
}

Why is tinyxml's FirstAttribute() return null in simple function?

Running on: Linux Mint 16, QtCreator 3.0.0 with Qt 5.2.0. tinyXML last version.
XML working with:
<users>
<user>
<username>testadmin</username>
<password>testpwd</password>
<privileges>2</privileges>
</user>
<user>
<username>testuser</username>
<password>testpwd</password>
<privileges>1</privileges>
</user>
Code that's not working:
void Server::loadUsersFile()
{
TiXmlDocument usersDoc("users.xml");
bool loadOk = usersDoc.LoadFile();
if(!loadOk) {
cout << "Error opening users file" << endl;
return;
}
TiXmlElement* rootChild = usersDoc.FirstChildElement("users");
TiXmlElement* userChild = rootChild->FirstChildElement("user");
TiXmlAttribute* pAttrib=userChild->FirstAttribute();
cout << pAttrib->Name();
}
pAttrib is NULL and I can't understand why. Maybe I didn't get the Child/Attribute relationship. Help appreciated.
Your first user element doesn't have any attributes. Attributes are data defined in the start tag of an element, so for the user element to have an attribute it would have to look something like
<user type="posix">
...
</user>
where type="posix" is an attribute of the user element.
To get the username element in an element, perhaps you want
TiXmlElement* username = userChild->FirstChildElement();
or more safely
TiXmlElement* username = userChild->FirstChildElement("username");
Your XML doesn't have any attributes. It has text within elements, but no attributes.
If your XML had something like:
<user name="foo" />
then it would give you an attribute. And you could have multiple attributes of course:
<user username="testadmin" password="testpassword" privileges="2" />
You could either change your code, or change your XML to use attributes. If you do, I'd recommend that you ask for specific attributes rather than requiring that the username is the first attribute, for example.
If you're going to work with XML, it's important to know the terminology, otherwise you'll get very confused. It might be worth looking through an XML tutorial before going much further.
I decided to stick to the "text inside elments" version because it seems easier to read. Thanks for the help! So, the code is:
void Server::loadUsersFile()
{
cout << "Loading users list..."<<endl;
TiXmlDocument usersDoc(this->fileUsersStr);
bool loadOk = usersDoc.LoadFile();
if(!loadOk) {
cout << "Error opening users file" << endl;
return;
}
TiXmlElement* rootChild = usersDoc.FirstChildElement("users");
TiXmlElement* userChild = rootChild->FirstChildElement("user");
TiXmlElement *usernameElem;
TiXmlElement *passwordElem;
TiXmlElement *privilegesElem;
while(userChild)
{
usernameElem = userChild->FirstChildElement("username");
passwordElem = userChild->FirstChildElement("password");
privilegesElem = userChild->FirstChildElement("privileges");
cout << "username: " << usernameElem->GetText() << endl;
cout << "password: " << passwordElem->GetText() << endl;
cout << "privilege:" << atoi(privilegesElem->GetText()) << endl;
userChild = userChild->NextSiblingElement();
}
cout << "Finished loading."<<endl;
}

XslCompiledTransform.Transform replaces """ with real quotes

Doing something like this:
using (XmlWriter myMamlHelpWriter = XmlWriter.Create(myFileStream, XmlHelpExToMamlXslTransform.OutputSettings))
{
XmlHelpExToMamlXslTransform.Transform(myMsHelpExTopicFilePath, null, myMamlHelpWriter);
}
where
private static XslCompiledTransform XmlHelpExToMamlXslTransform
{
get
{
if (fMsHelpExToMamlXslTransform == null)
{
// Create the XslCompiledTransform and load the stylesheet.
fMsHelpExToMamlXslTransform = new XslCompiledTransform();
using (Stream myStream = typeof(XmlHelpBuilder).Assembly.GetManifestResourceStream(
typeof(XmlHelpBuilder),
MamlXmlTopicConsts.cMsHelpExToMamlTransformationResourceName))
{
XmlTextReader myReader = new XmlTextReader(myStream);
fMsHelpExToMamlXslTransform.Load(myReader, null, null);
}
}
return fMsHelpExToMamlXslTransform;
}
}
And every time the string """ is replaced with real quotes in the result file.
Cannot understand why this happens...
The reason is that in the XSLT's internal representation, " is exactly the same characer as ". They both represent the ascii code point 0x34. It would seem that when the XslCompiledTransform produces its output, it uses " where it's legal to do so. I would imagine that it would still output " inside an attribute value.
Is it a problem for you that " is produced as " in the output?
I just ran the following XSLT in Visual Studio using an arbitrary input file:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/*">
<xml>
<xsl:variable name="chars">"&apos;<>&</xsl:variable>
<node a='{$chars}' b="{$chars}">
<xsl:value-of select="$chars"/>
</node>
</xml>
</xsl:template>
</xsl:stylesheet>
The output was:
<xml>
<node a=""'<>&" b=""'<>&">"'<>&</node>
</xml>
As you can see, even though all five characters were represented as entities originally, the apostrophies are produced as ' everywhere, and quotation marks are produced as " in text nodes. Furthermore, the a attribute which had ' delimiters uses " delimiters in the output. As I said, as far as the XSLT cares, a quotation mark is just a quotation mark, and an attribute is just an attribute. How those are produced in the output is up to the XSLT processor.
Edit: The root cause of this behavior appears to be the behavior of the XmlWriter class. It looks like the general suggestion for those who want more customized escaping is to extend the XmlTextWriter class. This page has an implementation that looks fairly promising:
public class KeepEntityXmlTextWriter : XmlTextWriter
{
private static readonly string[] ENTITY_SUBS = new string[] { "&apos;", """ };
private static readonly char[] REPLACE_CHARS = new char[] { '\'', '"' };
public KeepEntityXmlTextWriter(string filename) : base(filename, null) { ; }
private void WriteStringWithReplace(string text)
{
string[] textSegments = text.Split(KeepEntityXmlTextWriter.REPLACE_CHARS);
if (textSegments.Length > 1)
{
for (int pos = -1, i = 0; i < textSegments.Length; ++i)
{
base.WriteString(textSegments[i]);
pos += textSegments[i].Length + 1;
// Assertion: Replace the following if-else when the number of
// replacement characters and substitute entities has grown
// greater than 2.
Debug.Assert(2 == KeepEntityXmlTextWriter.REPLACE_CHARS.Length);
if (pos != text.Length)
{
if (text[pos] == KeepEntityXmlTextWriter.REPLACE_CHARS[0])
base.WriteRaw(KeepEntityXmlTextWriter.ENTITY_SUBS[0]);
else
base.WriteRaw(KeepEntityXmlTextWriter.ENTITY_SUBS[1]);
}
}
}
else base.WriteString(text);
}
public override void WriteString( string text)
{
this.WriteStringWithReplace(text);
}
}
On the other hand, the MSDN documentation recommends using XmlWriter.Create() rather than instantiating XmlTextWriters directly.
In the .NET Framework 2.0 release, the recommended practice is to create XmlWriter instances using the XmlWriter.Create method and the XmlWriterSettings class. This allows you to take full advantage of all the new features introduced in this release. For more information, see Creating XML Writers.
One way around that would be to use the same logic as above, but put it in a class that wraps an XmlWriter. This page has a ready-made implementation of an XmlWrappingWriter, that you can modify as needed.
To use the above code with the XmlWrappingWriter, you would subclass the wrapping writer, like this:
public class KeepEntityWrapper : XmlWrappingWriter
{
public KeepEntityWrapper(XmlWriter baseWriter)
: base(baseWriter)
{
}
private static readonly string[] ENTITY_SUBS = new string[] { "&apos;", """ };
private static readonly char[] REPLACE_CHARS = new char[] { '\'', '"' };
private void WriteStringWithReplace(string text)
{
string[] textSegments = text.Split(REPLACE_CHARS);
if (textSegments.Length > 1)
{
for (int pos = -1, i = 0; i < textSegments.Length; ++i)
{
base.WriteString(textSegments[i]);
pos += textSegments[i].Length + 1;
// Assertion: Replace the following if-else when the number of
// replacement characters and substitute entities has grown
// greater than 2.
Debug.Assert(2 == REPLACE_CHARS.Length);
if (pos != text.Length)
{
if (text[pos] == REPLACE_CHARS[0])
base.WriteRaw(ENTITY_SUBS[0]);
else
base.WriteRaw(ENTITY_SUBS[1]);
}
}
}
else base.WriteString(text);
}
public override void WriteString(string text)
{
this.WriteStringWithReplace(text);
}
}
Note this essentially the same code as the KeepEntityXmlTextWriter, but using XmlWrappingWriter as the base class and with a different constructor.
I don't recognize the Guard that the XmlWrappingWriter code is using in two places, but given that you'll be consuming the code yourself, it should be pretty safe to delete the lines like this. They just ensure that a null value isn't passed to the constructor or the (in the above case inaccessible) BaseWriter property:
Guard.ArgumentNotNull(baseWriter, "baseWriter");
To create an instance of the XmlWrappingWriter, you would create an XmlWriter however you need to, and then use:
KeepEntityWrapper wrap = new KeepEntityWrapper(writer);
And then you'd use this wrap variable as the XmlWriter you pass to your XSL transform.
The XSLT processor doesn't know whether a character was represented by a character entity or not. This is because the XML parser substitutes any character entity with its code-value.
Therefore, the XSLT processor would see exactly the same character, regardless whether it was represented as " or as " or as " or as ".
What you want can be achieved in XSLT 2.0 by using the so called "character maps".
Here is the trick you wanted:
replace all & with &
perform XSLT
replace all & with &

Parsin XML file using pugixml

Hi
I want to use XML file as a config file, from which I will read parameters for my application. I came across on PugiXML library, however I have problem with getting values of attributes.
My XML file looks like that
<?xml version="1.0"?>
<settings>
<deltaDistance> </deltaDistance>
<deltaConvergence>0.25 </deltaConvergence>
<deltaMerging>1.0 </deltaMerging>
<m> 2</m>
<multiplicativeFactor>0.7 </multiplicativeFactor>
<rhoGood> 0.7 </rhoGood>
<rhoMin>0.3 </rhoMin>
<rhoSelect>0.6 </rhoSelect>
<stuckProbability>0.2 </stuckProbability>
<zoneOfInfluenceMin>2.25 </zoneOfInfluenceMin>
</settings>
To pare XML file I use this code
void ReadConfig(char* file)
{
pugi::xml_document doc;
if (!doc.load_file(file)) return false;
pugi::xml_node tools = doc.child("settings");
//[code_traverse_iter
for (pugi::xml_node_iterator it = tools.begin(); it != tools.end(); ++it)
{
cout<<it->name() << " " << it->attribute(it->name()).as_double();
}
}
and I also was trying to use this
void ReadConfig(char* file)
{
pugi::xml_document doc;
if (!doc.load_file(file)) return false;
pugi::xml_node tools = doc.child("settings");
//[code_traverse_iter
for (pugi::xml_node_iterator it = tools.begin(); it != tools.end(); ++it)
{
cout<<it->name() << " " << it->value();
}
}
Attributes are loaded corectly , however all values are equals 0. Could somebody tell me what I do wrong ?
I think your problem is that you're expecting the value to be stored in the node itself, but it's really in a CHILD text node. A quick scan of the documentation showed that you might need
it->child_value()
instead of
it->value()
Are you trying to get all the attributes for a given node or do you want to get the attributes by name?
For the first case, you should be able to use this code:
unsigned int numAttributes = node.attributes();
for (unsigned int nAttribute = 0; nAttribute < numAtributes; ++nAttribute)
{
pug::xml_attribute attrib = node.attribute(nAttribute);
if (!attrib.empty())
{
// process here
}
}
For the second case:
LPCTSTR GetAttribute(pug::xml_node & node, LPCTSTR szAttribName)
{
if (szAttribName == NULL)
return NULL;
pug::xml_attribute attrib = node.attribute(szAttribName);
if (attrib.empty())
return NULL; // or empty string
return attrib.value();
}
If you want stock plain text data into the nodes like
<name> My Name</name>
You need to make it like
rootNode.append_child("name").append_child(node_pcdata).set_value("My name");
If you want to store datatypes, you need to set an attribute. I think what you want is to be able to read the value directly right?
When you are writing the node,
rootNode.append_child("version").append_attribute("value").set_value(0.11)
When you want to read it,
rootNode.child("version").attribute("version").as_double()
At least that's my way of doing it!