Attempt at using libxml2 to validate xml file with xsd file - c++

I am creating a tool that gets an xml file that contains input data for set tool, generated by some other program. Before I can use this input data, I should validate the xml file, make sure all the data is there. I am attempting to create an xsd file to do the validation (using the libxml2 library).
The c++ code that attempts at validating this, is quite rudimentary for now:
XmlValidator inputXmlValidator{};
res = inputXmlValidator.ConfigureValidationSchema("../temp_test_inputs/xml_validation/input.xsd");
if (!res)
return -1;
res = xmlValidator.ValidateXml("../temp_test_inputs/xml_validation/input.xml");
if (!res)
return -1;
bool XmlValidator::ConfigureValidationSchema(const char* xsdFilename)
{
if (schemaValidationContext != nullptr)
{
F1_ERROR("This instance already has a validation scheme configured!");
return false; // there is already a schema present...
}
xmlSchemaParserCtxtPtr schemaParserContext = nullptr;
schemaParserContext = xmlSchemaNewParserCtxt(xsdFilename);
if (schemaParserContext)
{
xmlSchemaPtr parsedSchema = nullptr;
xmlSchemaSetParserStructuredErrors(schemaParserContext, ProcessParsingError, nullptr);
parsedSchema = xmlSchemaParse(schemaParserContext);
xmlSchemaFreeParserCtxt(schemaParserContext);
if (parsedSchema)
schemaValidationContext = xmlSchemaNewValidCtxt(parsedSchema);
}
return schemaValidationContext != nullptr;
}
bool XmlValidator::ValidateXml(const char* xmlFilename)
{
if (schemaValidationContext == nullptr)
{
F1_ERROR("No validation scheme configured! Failed to validate '{0}'.", xmlFilename);
return false; // there is no validation schema present...
}
// read the xml file
xmlTextReaderPtr xmlTextReader = xmlReaderForFile(xmlFilename, NULL, 0);
if (xmlTextReader == nullptr)
{
F1_ERROR("Failed to open '{0}'.", xmlFilename);
return false; // failed to read xml file...
}
// configure schema validation
int hasSchemeErrors = 0;
xmlTextReaderSchemaValidateCtxt(xmlTextReader, schemaValidationContext, 0);
xmlSchemaSetValidStructuredErrors(schemaValidationContext, ProcessValidatorError, &hasSchemeErrors);
// process the xml file
int hasValidationErrors = 0;
do
{
hasValidationErrors = xmlTextReaderRead(xmlTextReader);
} while (hasValidationErrors == 1 && !hasSchemeErrors);
// process errors
//if (hasValidationErrors != 0)
//{
// xmlErrorPtr err = xmlGetLastError();
// F1_ERROR("Failed to parse '{0}' at line {1}, col {2}! Error {3}: {4}", err->file, err->line, err->int2, err->code, err->message);
//}
// free up the text reader memory
xmlFreeTextReader(xmlTextReader);
return hasValidationErrors == 0; // return true if no errors found
}
I am quite sure this code works, as I attempted it with the shiporder example. Now I attempt at moving away from the example and alter it for my own xml file.
I went with the most basic xml file (only the root element and an attribute)
<?xml version="1.0" encoding="UTF-8"?>
<root timestamp="20220714 1324">
</root>
And I cannot create a suitable xsd file that can validate this:
<?xml version="1.0" encoding="UTF-8" ?>
<xs:schema elementFormDefault="qualified" attributeFormDefault="unqualified"
xmlns:xs="http://www.w3.org/2001/XMLSchema" >
<xs:element name="root">
<xs:complexType>
<xs:attribute type="xs:string" name="timestamp"/>
</xs:complexType>
</xs:element>
</xs:schema>
I keep getting the error
Failed to validate '../temp_test_inputs/xml_validation/input.xml' at line 2, col 0! Error 1845: Element 'root': No matching global declaration available for the validation root.
After 3 days of searching, I could not find any solution that got me further than this error... Any ideas? In need I could alter the xml file, but preferably I would like to only tweak the xsd file (if possible). I'll take any solution at this point...

I found the mistake... And of course it's a stupid one...
My main function should have looked like
auto res = xmlValidator.ConfigureValidationSchema("../temp_test_inputs/xml_validation/shiporder.xsd");
if (!res)
return -1;
res = xmlValidator.ValidateXml("../temp_test_inputs/xml_validation/shiporder.xml");
if (!res)
return -1;
F1_TEST_COMMON::XmlValidator inputXmlValidator{};
res = inputXmlValidator.ConfigureValidationSchema("../temp_test_inputs/xml_validation/input.xsd");
if (!res)
return -1;
res = inputXmlValidator.ValidateXml("../temp_test_inputs/xml_validation/input.xml");
if (!res)
return -1;
However, it had a 'minor' mistake, where it was using the shiporder.xsd (xmlValidator) rather than the input.xsd (inputXmlValidator)...
xmlValidator.ValidateXml("../temp_test_inputs/xml_validation/input.xml");
Only one variable mistake while copy pasting... Wasted over a week of debugging...

Related

Xerces XPath causes seg fault when path doesn't exist

I can successfully use Xerces XPath feature to query for information from an XML with the following XML and C++ code.
XML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<ApplicationSettings>
hello universe
</ApplicationSettings>
</root>
C++
int main()
{
XMLPlatformUtils::Initialize();
// create the DOM parser
XercesDOMParser *parser = new XercesDOMParser;
parser->setValidationScheme(XercesDOMParser::Val_Never);
parser->parse("fake_cmf.xml");
// get the DOM representation
DOMDocument *doc = parser->getDocument();
// get the root element
DOMElement* root = doc->getDocumentElement();
// evaluate the xpath
DOMXPathResult* result=doc->evaluate(
XMLString::transcode("/root/ApplicationSettings"), // <-- HERE IS THE XPATH
root,
NULL,
DOMXPathResult::ORDERED_NODE_SNAPSHOT_TYPE, //DOMXPathResult::ANY_UNORDERED_NODE_TYPE, //DOMXPathResult::STRING_TYPE,
NULL);
// look into the xpart evaluate result
result->snapshotItem(0);
std::cout<<TranscodeToStr(result->getNodeValue()->getFirstChild()->getNodeValue(),"ascii").str()<<std::endl;;
XMLPlatformUtils::Terminate();
return 0;
}
The problem is that sometimes my XML will only have certain fields. But if I remove the ApplicationSettings entry from the XML it will seg fault. How can I properly handle these optional fields? I know that trying to correct from seg faults is risky business.
The seg fault is occurring in this line
std::cout<<TranscodeToStr(result->getNodeValue()->getFirstChild()->getNodeValue(),"ascii").str()<<std::endl;
specifically in get getFirstChild() call because the result of getNodeValue() is NULL.
This is my quick and dirty solution. It's not really ideal but it works. I would prefer a more sophisticated evaluation and response.
if (result->getNodeValue() == NULL)
{
cout << "There is no result for the provided XPath " << endl;
}
else
{
cout<<TranscodeToStr(result->getNodeValue()->getFirstChild()->getNodeValue(),"ascii").str()<<endl;
}

Read XML node with RapidXML

I'm using RapidXML to parse XML files and read nodes content but I don't want to read values inside a node, I need to read the content of specific XML nodes "as XML" not as parsed values.
Example :
<node1>
<a_lot_of_xml>
< .... >
</a_lot_of_xml>
</node1>
I need to get the content of node1 as :
<a_lot_of_xml>
< .... >
</a_lot_of_xml>
What I tired :
I tried something but its not really good in my opinion, its about to put in node1, the path of an other xml file to read, I did like this :
<file1ToRead>MyFile.xml</file1ToRead>
And then my c++ code is the following :
ifstream file(FileToRead);
stringstream buffer; buffer << file.rdbuf();
But the problem is users will have a lot of XML files to maintain and I just want to use one xml file.
I think "a lot of XML files" is a better way, so you have a directory of all xml files, you can read the xml file when you need it, good for performance.
Back to the problem, can use the rapidxml::print function to get the xml format.
bool test_analyze_xml(const std::string& xml_path)
{
try
{
rapidxml::file<> f_doc(xml_path.c_str());
rapidxml::xml_document<> xml_doc;
xml_doc.parse<0>(const_cast<char*>(f_doc.data()));
rapidxml::xml_node<>* node_1 = xml_doc.first_node("node1");
if(node_1 == NULL)
{
return false;
}
rapidxml::xml_node<>* plain_txt = node_1->first_node("a_lot_of_xml");
if (plain_txt == NULL)
{
return false;
}
std::string xml_data;
rapidxml::print(std::back_inserter(xml_data), *plain_txt, rapidxml::print_no_indenting); //the xml_data is XML format.
}
catch (...)
{
return false;
}
return true;
}
I'm unfamiliar with rapidxml, but I have done this with tinyxml2. The trick is to read out node1 and then create a new XMLDoc (using tinyxml2 terms here) that contains everything inside of node1. From there, you can use their XMLPrinter class to convert your new XMLDoc (containing everything in node1) to a string.
tinyxml2 is a free download.

Reading xml file using QDomDocument just get the first line

I generate a xml file using QXmlStreamWriter. The file looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<pedestrianinfo>
<pedestrian uuid="2112e2ed-fc9b-41e8-bbcb-b44ad78bde11">
<module>11.1208</module>
<direction>4</direction>
<row>5</row>
<column>71</column>
</pedestrian>
<pedestrian uuid="1aabb9c1-4aa7-4f47-9542-36d2dfaa26e4">
<module>1.48032</module>
<direction>4</direction>
<row>67</row>
<column>31</column>
</pedestrian>
...
</pedestrianinfo>
Then I try to read the content by QDomDocument. My code looks like this:
xmlReader *xp = new xmlReader(QString("D:\\0T.xml"));
if(xp->openFile()) {
if(xp->isGetRootIndex()) {
xp->parseRootIndexElement();
}
else
cout<<"Unable to get root index."<<endl;
}
Here is isGetRootIndex():
bool xmlReader::isGetRootIndex()
{
doc.setContent(&file,false);
root = doc.documentElement();
if(root.tagName() == getRootIndex()) //rootIndex=="pedestrianinfo"
return true;
return false;
}
This is parseRootIndexElement():
void xmlReader::parseRootIndexElement()
{
QDomNode child = root.firstChild();
while(!child.isNull()) {
if(child.toElement().tagName() == getTagNameP()) //"childTagName=="pedestrian"
parseEntryElement(child.toElement());
qDebug()<<"module="<<module<<" direction="<<direction<<" row="<<row<<" column="<<column;
child = child.nextSibling();
}
}
parseEntryElement(const QDomElement &element) is a function to get the infomation in each tag and save them into variables such as module.
However, each time I run my code, only the first child of xml file could be qDebug*ed*. It seems that after executing child.nextSibling(), child becomes null. Why does it not get the next pedestrian info?
Looks correct to me based on what I see in the documentation. Perhaps parseEntryElement is advancing the iterator unexpectedly?

Parsin XML file using pugixml

Hi
I want to use XML file as a config file, from which I will read parameters for my application. I came across on PugiXML library, however I have problem with getting values of attributes.
My XML file looks like that
<?xml version="1.0"?>
<settings>
<deltaDistance> </deltaDistance>
<deltaConvergence>0.25 </deltaConvergence>
<deltaMerging>1.0 </deltaMerging>
<m> 2</m>
<multiplicativeFactor>0.7 </multiplicativeFactor>
<rhoGood> 0.7 </rhoGood>
<rhoMin>0.3 </rhoMin>
<rhoSelect>0.6 </rhoSelect>
<stuckProbability>0.2 </stuckProbability>
<zoneOfInfluenceMin>2.25 </zoneOfInfluenceMin>
</settings>
To pare XML file I use this code
void ReadConfig(char* file)
{
pugi::xml_document doc;
if (!doc.load_file(file)) return false;
pugi::xml_node tools = doc.child("settings");
//[code_traverse_iter
for (pugi::xml_node_iterator it = tools.begin(); it != tools.end(); ++it)
{
cout<<it->name() << " " << it->attribute(it->name()).as_double();
}
}
and I also was trying to use this
void ReadConfig(char* file)
{
pugi::xml_document doc;
if (!doc.load_file(file)) return false;
pugi::xml_node tools = doc.child("settings");
//[code_traverse_iter
for (pugi::xml_node_iterator it = tools.begin(); it != tools.end(); ++it)
{
cout<<it->name() << " " << it->value();
}
}
Attributes are loaded corectly , however all values are equals 0. Could somebody tell me what I do wrong ?
I think your problem is that you're expecting the value to be stored in the node itself, but it's really in a CHILD text node. A quick scan of the documentation showed that you might need
it->child_value()
instead of
it->value()
Are you trying to get all the attributes for a given node or do you want to get the attributes by name?
For the first case, you should be able to use this code:
unsigned int numAttributes = node.attributes();
for (unsigned int nAttribute = 0; nAttribute < numAtributes; ++nAttribute)
{
pug::xml_attribute attrib = node.attribute(nAttribute);
if (!attrib.empty())
{
// process here
}
}
For the second case:
LPCTSTR GetAttribute(pug::xml_node & node, LPCTSTR szAttribName)
{
if (szAttribName == NULL)
return NULL;
pug::xml_attribute attrib = node.attribute(szAttribName);
if (attrib.empty())
return NULL; // or empty string
return attrib.value();
}
If you want stock plain text data into the nodes like
<name> My Name</name>
You need to make it like
rootNode.append_child("name").append_child(node_pcdata).set_value("My name");
If you want to store datatypes, you need to set an attribute. I think what you want is to be able to read the value directly right?
When you are writing the node,
rootNode.append_child("version").append_attribute("value").set_value(0.11)
When you want to read it,
rootNode.child("version").attribute("version").as_double()
At least that's my way of doing it!

Runtime error with tinyXML element access

yester day was my first attempt. I am trying to catch the variable "time" in the following "new.xml" file
<?xml version="1.0" standalone=no>
<main>
<ToDo time="1">
<Item priority="1"> Go to the <bold>Toy store!</bold></Item>
<Item priority="2"> Do bills</Item>
</ToDo>
<ToDo time="2">
<Item priority="1"> Go to the Second<bold>Toy store!</bold></Item>
</ToDo>
</main>
Here is my code
TiXmlDocument doc("new.xml");
TiXmlNode * element=doc.FirstChild("main");
element=element->FirstChild("ToDo");
string temp=static_cast<TiXmlElement *>(element)->Attribute("time");
But I am getting run time errors from the third and fourth lines. Can anybody shed a light on this isssue?
It seems to me that you forgot to load the file. Normally I do something along these lines:
TiXmlDocument doc("document.xml");
bool loadOkay = doc.LoadFile(); // Error checking in case file is missing
if(loadOkay)
{
TiXmlElement *pRoot = doc.RootElement();
TiXmlElement *element = pRoot->FirstChildElement();
while(element)
{
string value = firstChild->Value(); // In your example xml file this gives you ToDo
string attribute = firstChild->Attribute("time"); //Gets you the time variable
element = element->NextSiblingElement();
}
}
else
{
//Error conditions
}
Hope this helps
#include "tinyXml/tinyxml.h"
const char MY_XML[] = "<?xml version='1.0' standalone=no><main> <ToDo time='1'> <Item priority='1'> Go to the <bold>Toy store!</bold></Item> <Item priority='2'> Do bills</Item> </ToDo> <ToDo time='2'> <Item priority='1'> Go to the Second<bold>Toy store!</bold></Item> </ToDo></main>";
void main()
{
TiXmlDocument doc;
TiXmlHandle docHandle(&doc);
const char * const the_xml = MY_XML;
doc.Parse(MY_XML);
TiXmlElement* xElement = NULL;
xElement = docHandle.FirstChild("main").FirstChild("ToDo").ToElement();
int element_time = -1;
while(xElement)
{
if(xElement->QueryIntAttribute("time", (int*)&element_time) != TIXML_SUCCESS)
throw;
xElement = xElement->NextSiblingElement();
}
}
That's how it works. Compiled & tested.
As you can see your tries to make it extra-safe code cost you with an exceotion at your third line (of the question), and without testing I can bet it's a "pointing-to-null" exception.
Just load it my style, as TinyXml's docs say as well: "docHandle.FirstChild("main").FirstChild("ToDo").ToElement();".
Hope it helps you understand, let me know if it's not clear. I accept visa (:
Is it just me or the the pugixml version looks much better?
#include <iostream>
#include "pugixml.hpp"
using namespace std;
using namespace pugi;
int main()
{
xml_document doc;
if (!doc.load_file("new.xml"))
{
cerr << "Could not load xml";
return 1;
}
xml_node element = doc.child("main");
element = element.child("ToDo");
cout << "Time: " << element.attribute("time") << endl;
}
Also new.xml had an error, instead of:
<?xml version="1.0" standalone=no>
should be
<?xml version="1.0" standalone="no"?>
Compilation was just a matter of cl test.cpp pugixml.cpp