rapidXML, corrupted memory when traversing DOM tree - c++

Don't understand what is going on with the attribute's memory and rapidXML.
A function encapsulates the xml parsing, if success, returns a reference to the root node, when calling the traverse DOM tree inside this function I get the correct data stored in an xml file.
typedef rapidxml::xml_node<>* Node;
...
Node Load()
{
Node pRootNode = NULL;
// read file stream in bytes
...
std::vector<char> xmlCopy(bytes.begin(), bytes.end());
xmlCopy.push_back('\0');
rapidxml::xml_document<> doc;
try
{
doc.parse<rapidxml::parse_declaration_node | rapidxml::parse_no_data_nodes>(&bytes[0]);
pRootNode = doc.first_node();
...
TraverseDOMTree(pRootNode);
}
return pRootNode;
}
TraverseDOMTree prints all attributes and node names as expected.
Later, obviously outside the scope of Load, pRootNode will be used to query values from the DOM three, this doesn't work.
For testing purposes calling TraverseDOMTree, which perfectly worked, now prints attribute's garbage values. I can assume the DOM tree is still there, the same hierarchy of nodes as in the first call, but the attributes values are messed up.
I tried making the rapidxml::xml_document<> doc global and also adding the parse_non_destructive flag, none of those make a difference.
If matters, the client using the Load method is running in the same thread. What can be wrong?

std::vector<char> xmlCopy(bytes.begin(), bytes.end());
The local copy of the serial representation of your XML document is local. I would bet that rapidXML makes no copy of the attributes, but rather uses pointers to the sequence. You could check that by looking at the addresses of the attribute values and your document copy.

Related

Can I set the child of an object to be an object defined separetely in PugiXML?

I want to configure a device using an XML file and was thinking that I can make the individual pugi::xml_nodes first with the values I need and later on make them children of a document or some parent node. However, I seem to be doing something wrong.
Example that works:
#include "pugixml.hpp"
int main(){
pugi::xml_document xml;
pugi::xml_node configRecord = xml.append_child("configrecord");
pugi::xml_node configGroup = configRecord.append_child("configgroup");
configGroup.append_attribute("name") = "ftp server";
}
This works because I first create the parent document and then start branching by adding children. I was thinking that I can first make the node objects, store them into an array and parse that array to add them to the document. But this doesn't work.
#include "pugixml.hpp"
int main(){
pugi::xml_node myNode;
myNode.set_name("value");
myNode.append_child(pugi::node_pcdata).set_value("enable");
pugi::xml_document docu;
docu.set_name("document");
docu.child(myNode); // <- error here, cannot add child to document
}
Can I somehow use the strategy I was planning to or am I constrained to only adding children to an existing parent?
pugixml documentation states that pugi::xml_node is a non-owning pointer to actual node data stored in a pugi::xml_document object:
xml_node is the handle to document node; it can point to any node in
the document, including the document node itself. There is a common
interface for nodes of all types; the actual node type can be queried
via the xml_node::type() method. Note that xml_node is only a handle
to the actual node, not the node itself
Nodes and attributes do not exist without a document tree, so you
can’t create them without adding them to some document.
It seems to me that your code doesn't throw errors when you try to manipulate myNode because default-constructed "null" nodes silently consume operations on them to make chaining easier:
all operations are defined on empty nodes; generally the operations
don’t do anything and return empty nodes/attributes or empty strings
as their result [...] This is useful for chaining calls

Possible to get QDomElement from QXmlStreamWriter?

I'm writing a small XMPP server using qxmpp. Now I want to create a QXmppStanza and present it (as if a client had sent it) to the server and my plugins using
void QXmppServer::handleElement(const QDomElement &element)
This function requires a QDomElement and not a QXmppStanza. The only XML realted function I found in QXmppStanza and its derived classes (besides parse(...) ) is the function
void toXml(QXmlStreamWriter *writer)
I don't have experience with XML handling in qt yet, so is there a more performant way than writing the XML to a string/ByteArray, use it as input to create a new QDomElement and return its documentElement?
After doing some further research I have to accept it is not possible.
As stated in QDomDocument's documentation I always require a QDomDocument in order to work with a QDomElement (and other nodes):
Since elements, text nodes, comments, processing instructions, etc., cannot exist outside the context of a document (...)
The QXmlStreamWriter doesn't have a QDomDocument, so I really have to create a QDomDocument (which of course must live as long I want to work with the element) and then parse the text (QDomDocument::setContent).
I had a similar issue and was able to convert from a stream to a DOM element by doing something similar to what is shown below.
The first step is to stream to a byte array.
QByteArray data;
QXmlStreamWriter writer(&data);
object->toXml(&writer);
The second step is to set the content of a DOM document. The document's document element should be the DOM element you need.
QDomDocument temp;
if(temp.setContent(data))
QDomElement element = temp.documentElement(); // do whatever you want with this element

Xerces, xpaths, and XML namespaces

I'm trying to use xerces-c in order to parse a rather massive XML document generated from StarUML in order to change some things, but I'm running into issues getting the xpath query to work because it keeps crashing.
To simplify things I split out part of the file into a smaller XML file for testing, which looks like this:
<?xml version="1.0" encoding="utf-8"?>
<XPD:UNIT xmlns:XPD="http://www.staruml.com" version="1">
<XPD:HEADER>
<XPD:SUBUNITS>
</XPD:SUBUNITS>
</XPD:HEADER>
<XPD:BODY>
<XPD:OBJ name="Attributes[3]" type="UMLAttribute" guid="onMjrHQ0rUaSkyFAWtLzKwAA">
<XPD:ATTR name="StereotypeName" type="string">ConditionInteraction</XPD:ATTR>
</XPD:OBJ>
</XPD:BODY>
</XPD:UNIT>
All I'm trying to do for this example is to find all of the XPD:OBJ elements, of which there is only one. The problem seems to stem from trying to query with the namespace. When I pass a very simple xpath query of XPD:OBJ it will crash, but if I pass just OBJ it won't crash but it won't find the XPD:OBJ element.
I assume there's some important property or setting that I'm missing during initialization that I need to set but I have no idea what it might be. I looked up all of the properties of the parser having to do with namespace and enabled the ones I could but it didn't help at all so I'm completely stuck. The initialization code looks something like this, with lots of things removed obviously:
const tXercesXMLCh tXMLManager::kDOMImplementationFeatures[] =
{
static_cast<tXercesXMLCh>('L'),
static_cast<tXercesXMLCh>('S'),
static_cast<tXercesXMLCh>('\0')
};
// Instantiate the DOM parser.
fImplementation = static_cast<tXercesDOMImplementationLS *>(tXercesDOMImplementationRegistry::getDOMImplementation(kDOMImplementationFeatures));
if (fImplementation != nullptr)
{
fParser = fImplementation->createLSParser(tXercesDOMImplementationLS::MODE_SYNCHRONOUS, nullptr);
fConfig = fParser->getDomConfig();
// Let the validation process do its datatype normalization that is defined in the used schema language.
//fConfig->setParameter(tXercesXMLUni::fgDOMDatatypeNormalization, true);
// Ignore comments and whitespace so we don't get extra nodes to process that just waste time.
fConfig->setParameter(tXercesXMLUni::fgDOMComments, false);
fConfig->setParameter(tXercesXMLUni::fgDOMElementContentWhitespace, false);
// Setup some properties that look like they might be required to get namespaces to work but doesn't seem to help at all.
fConfig->setParameter(tXercesXMLUni::fgXercesUseCachedGrammarInParse, true);
fConfig->setParameter(tXercesXMLUni::fgDOMNamespaces, true);
fConfig->setParameter(tXercesXMLUni::fgDOMNamespaceDeclarations, true);
// Install our custom error handler.
fConfig->setParameter(tXercesXMLUni::fgDOMErrorHandler, &fErrorHandler);
}
Then later on I parse the document, find the root node, and then run the xpath query to find the node I want. I'll leave out the bulk of that and just show you where I'm running the xpath query in case there's something obviously wrong there:
tXercesDOMDocument * doc; // Comes from parsing the file.
tXercesDOMNode * contextNode; // This is the root node retrieved from the document.
tXercesDOMXPathResult * xPathResult;
doc->evaluate("XPD:OBJ", contextNode, nullptr, tXercesDOMXPathResult::ORDERED_NODE_SNAPSHOT_TYPE), xPathResult);
The call to evaluate() is where it crashes somewhere deep inside xerces that I can't see very clearly, but from what I can see there are a lot of things that look deleted or uninitialized so I'm not sure what's causing the crash exactly.
So is there anything here that looks obviously wrong or missing that is required to make xerces work with XML namespaces?
The solution was right in front of my face the whole time. The problem was that you need to create and pass a resolver to the evaluate() call or else it will not be able to figure out any of the namespaces and will throw an exception. The crash seems to be a bug in xerces since it's crashing on trying to throw the exception when it can't resolve the namespace. I had to debug deep into the xerces code to find it, which gave me the solution.
So to fix the problem I changed the call to evaluate() slightly to create a resolver with the root node and now it works perfectly:
tXercesDOMDocument * doc; // Comes from parsing the file.
tXercesDOMNode * contextNode; // This is the root node retrieved from the document.
tXercesDOMXPathResult * xPathResult;
// Create the resolver with the root node, which contains the namespace definition.
tXercesDOMXPathNSResolver * resolver(doc->createNSResolver(contextNode));
doc->evaluate("XPD:OBJ", contextNode, resolver, tXercesDOMXPathResult::ORDERED_NODE_SNAPSHOT_TYPE), xPathResult);
// Make sure to release the resolver since anything created from a `create___()`
// function has to be manually released.
resolver->release();

Replace root node for Document

I found a memory leak in my application using libxml++ due to an XML document where I replace the root node. I took good care for removing any child nodes, but using the xmlpp::Document interface I find no way to replace the root node.
This is a sample of the offending code:
xmlpp::Document Doc;
Doc.create_root_node("root");
// Populate the document
// [...]
void ReplaceRootNode(const xmlpp::Element* NewRootNode)
{
// Remove all root node children
xmlpp::Element* RootNode = Doc.get_root_node();
const xmlpp::Node::NodeList Children = RootNode->get_children();
xmlpp::Node::NodeList::const_iterator itChild = Children.begin();
while (itChild != Children.end()) {
RootNode->remove_child(*itChild++);
}
// Replace root node
Doc.create_root_node_by_import(NewRootNode); // Leak: memory for previous root node is not freed
}
The solution I came up with so far is to edit the document's root node to change it's name and attributes but. Is there a simpler way to avoid this leak which does not involve edition of previous root node's name and attributes?
I work around this by setting the document to an empty Document object (Doc = xmlpp:Document()) before calling create_root_node_by_import instead of removing the root's child nodes explicitly. This appears to cause the previous contents of Doc to be freed.
I first encountered this problem several years ago, and it still does not appear to be fixed in recent versions of libxml++. Surely they must be aware of it. Could this case somehow be using create_root_node_by_import in an unintended fashion? I would not have thought so, but OTOH this seems too important not to fix.

Passing from a DOMNode* to a DOMElement* in Xerces-C

I have a c++ application that manipulates xml.
Well, at a certain point of my application I get a DOMNode* and then I attach it to an element as a child.
Well the problem is that I would like to add parameters to that node... well it is a node so it is not an element... only elements have parameters...
This is my code:
xercesc::DOMNode* node = 0;
std::string xml = from_an_obj_of_mine.GetXml(); /* A string with xml inside, the xml is sure an element having something inside */
xercesc::MemBufInputSource xml_buf((const XMLByte*)xml.c_str(), xml.size(), "dummy");
xercesc::XercesDOMParser* parser = new xercesc::XercesDOMParser();
parser->parse(xml_buf); /* parser will contain a DOMDocument well parsed from the string, I get here the node i want to attach */
node = my_pointer_to_a_preexisting_domdocument->GetXmlDocument()->importNode(parser->getDocument()->getDocumentElement(), true); /* I want to attach the node in parser to a node of my_pointer_to_an_el_of_my_preexisting_domdocument, it is a different tree, so I must import the node to attach it later */
my_pointer_to_an_el_of_my_preexisting_domdocument->appendChild(node);
As you can see I want to create a node from a string, I create it through a parse and then need to import the node to create a new identical node belonging to the dom tree where I want to attach the new node.
My steps are:
Get the xml string to attach to a pre-existing dom (stored as a domdocument somewhere)
Create a parser
Using the parser create a dom tree from the string
From my pre-existing dom (where I want to attach my new node), call the import and clone the node
so that it can be attached to the pre-existing dom.
Attach it
The problem is that import and import gets me a node... I want an element to attach...
I use appendChild to append elements too... of course the method wants DOMNode* but giving it a DOMElement* (which inherits from DOMNode) is ok...
How can I get an element from a node???
delete wd_parser;
ok I discovered it...
Just re-cast the node to element and it is done... DOMNode is a pure virtual class and it is parent of DOMElement... so it is correct and it is also the way to do things (logically speaking).
DOMElement* = dynamic_cast<DOMElement*>(node);
:)