libxml2 - failure to parse valid xml - c++

I have a small C program using libxml2 for parsing xml files. Basicaly, my code is like
xmlDocPtr doc = xmlParseFile("test.xml");
if (doc == nullptr) {
return;
}
xmlNodePtr node = xmlDocGetRootElement(doc);
if (node == nullptr) {
return;
}
...
I'm getting an error situation, where doc != null and node == null. Under which conditions could that happen? I've tested with completely valid, invalid, and empty files, it happens in every case. If file does not exist, doc == null (as it should). I suspect that the program is not able to open the file for some reason, but I've checked rights, and no other program uses that file. Also, this only happens in an environment, where I cannot use a debugger.

Related

Reading a XML file in C++ with TinyXML2

I'm pretty new to using XML in C++ and i'm trying to parse a list of files to download.
THe XML file I'm using is generated via PHP and looks like this :
<?xml version="1.0"?>
<FileList>
<File Name="xxx" Path="xxx" MD5="xxx" SHA1="xxx"/>
</FileList>
The code I'm using in C++ is the following, which I came up using some online tutorials (it's included in some global function):
tinyxml2::XMLDocument doc;
doc.LoadFile("file_listing.xml");
tinyxml2::XMLNode* pRoot = doc.FirstChild();
tinyxml2::XMLElement* pElement = pRoot->FirstChildElement("FileList");
if (pRoot == nullptr)
{
QString text = QString::fromLocal8Bit("Error text in french");
//other stuff
}
else
{
tinyxml2::XMLElement* pListElement = pElement->FirstChildElement("File");
while (pListElement != nullptr)
{
QString pathAttr = QString::fromStdString(pListElement->Attribute("Path"));
QString md5Attr = QString:: fromStdString(pListElement->Attribute("MD5"));
QString sha1Attr = QString::fromStdString(pListElement->Attribute("SHA1"));
QString currentPath = pathAttr.remove("path");
QString currentMd5 = this->fileChecksum(currentPath, QCryptographicHash::Md5);
QString currentSha1 = this->fileChecksum(currentPath, QCryptographicHash::Sha1);
QFile currentFile(currentPath);
if (md5Attr != currentMd5 || sha1Attr != currentSha1 || !currentFile.exists())
{
QString url = "url" + currentPath;
this->downloadFile(url);
}
pListElement = pListElement->NextSiblingElement("File");
}
Problem is, I get an error like "Access violation, this was nullptr" on the following line :
tinyxml2::XMLElement* pListElement = pElement->FirstChildElement("File");
Since I'm far from a pro when it comes to coding and I already searched the internet up and down, I hope that someone here can provide me some pointers.
Have a good day, folks.
I don't know if you have C++17 available, but you can remove a lot of noise by using auto* and if-init-expressions (or rely on the fact that pointers can be implicitly converted to boolean values.)
The main issue with your code is you were not using XMLElement* but instead a XMLNode. The function tinyxml2::XMLDocument::RootElement() automatically gets the top-most element for you.
Because you have an xml declaration at the top, FirstChild returns that...which doesn't have any children, so the rest of the code fails.
By using RootElement tinyxml knows to skip any leading non-element nodes (comments, doctypes, etc.) and give you <FileList> instead.
tinyxml2::XMLDocument doc;
auto err = doc.LoadFile("file_listing.xml");
if(err != tinyxml2::XML_SUCCESS) {
//Could not load file. Handle appropriately.
} else {
if(auto* pRoot = doc.RootElement(); pRoot == nullptr) {
QString text = QString::fromLocal8Bit("Error text in french");
//other stuff
} else {
for(auto* pListElement = pRoot->FirstChildElement("File");
pListElement != nullptr;
pListElement = pListElement->NextSiblingElement("File"))
{
QString pathAttr = QString::fromStdString(pListElement->Attribute("Path"));
QString md5Attr = QString:: fromStdString(pListElement->Attribute("MD5"));
QString sha1Attr = QString::fromStdString(pListElement->Attribute("SHA1"));
QString currentPath = pathAttr.remove("path");
QString currentMd5 = this->fileChecksum(currentPath, QCryptographicHash::Md5);
QString currentSha1 = this->fileChecksum(currentPath, QCryptographicHash::Sha1);
QFile currentFile(currentPath);
if(md5Attr != currentMd5 || sha1Attr != currentSha1 || !currentFile.exists()) {
QString url = "url" + currentPath;
this->downloadFile(url);
}
}
}
}
According to the reference for tinyxml2::XMLNodeFirstChild():
Get the first child node, or null if none exists.
This line will therefore get the root node:
tinyxml2::XMLNode* pRoot = doc.FirstChild();
Meaning when you attempt to find a FileList node within the root node it returns null.
To avoid the access violation, check your pointers are valid before using them. There is an if check for pRoot but the line immediately before it tries to call a function on pRoot. There is no if check for pElement so this is why you get an access violation. As well as checking pointers are valid, consider adding else blocks with logging to say what went wrong (e.g. "could not find element X"). This will help you in the long run - XML parsing is a pain, even with a library like Tinyxml, there are always teething problems like this, so getting into the habit of checki g pointers and logging out helpful messages will definitely pay off.

Git tree show untracked files

I am trying to fix this issue:
https://github.com/gitahead/gitahead/issues/380
The problem is that the tree used in the model does not contain any untracked files and therefore the view has nothing to show. When I stage on file it is shown.
Is there a way to track in the tree also the untracked files?
I created a small test application to find the problem. When one file is staged, count is unequal to zero, otherwise it is always zero.
Testsetup
new git repository (TestRepository) with the following untracked files:
testfile.txt
testfolder/testfile2.txt
d
#include <git2.h>
#include <stdio.h>
int main() {
git_libgit2_init();
git_repository *repo = NULL;
int error = git_repository_open(&repo, "/TestRepository");
if (error < 0) {
const git_error *e = git_error_last();
printf("Error %d/%d: %s\n", error, e->klass, e->message);
exit(error);
}
git_tree *tree = nullptr;
git_index* idx = nullptr;
git_repository_index(&idx, repo);
git_oid id;
if (git_index_write_tree(&id, idx)) {
const git_error *e = git_error_last();
printf("Error %d/%d: %s\n", error, e->klass, e->message);
exit(error);
}
git_tree_lookup(&tree, repo, &id);
int count = git_tree_entrycount(tree);
printf("%d", count);
git_repository_free(repo);
printf("SUCCESS");
return 0;
}
If I understood correctly, what you're seeing is normal: as the file is untracked/new, the index has no knowledge of it, so if you ask the index, it has no "staged" changes to compare with, hence no diff.
If you want a diff for a yet-to-be tracked file, you'll have to provide it another way, usually by asking git_diff to do the work of comparing the worktree version with /dev/null, the empty blob, etc.
Since you're after a libgit2 solution, the way I'm trying to do that in GitX is via the git_status_list_new API, which gives a somewhat filesystem-independent way of generating both viewable diffs (staged & unstaged) on-the-fly, using git_patch_from_blobs/git_patch_from_blobs_and_buffer. In retrospect, maybe that should live in the library as git_status_entry_generate_patch or something…

How to correctly check FileStreamWriter is in use?

I am using a System::IO::StreamWriter. My intention is to check if the file in use is open, and close it.
What is the correct way of checking if the file is in use. I keep getting System.NullReferenceException with the code below.
if (filewr2->BaseStream != nullptr)
{
filewr2->Close();
filewr2 = nullptr;
}

RapidXML - how can I handle missing nodes/values

I'd like to read from XML to C++ using RapidXML. However, if a node doen't exist or a value is missing the program crashes.
for (rapidxml::xml_node<> * xmlasset_node = root_node->first_node("Asset"); xmlasset_node; xmlasset_node = xmlasset_node->next_sibling())
{mystring += xmlasset_node->first_attribute("name")->value()};
However, this "name" attribute doesn't exist in all nodes and is to be filled with a default value, if its not in XML. Similar to this, I've got some sub-nodes not in all nodes. The reason is just to keep the XML as small and clear as possible for manual adjustments.
How can a check/test be implemented (C++), to prevent the program from crashing and just taking default values if a value/node doesn't exist?
Kind regards,
- Corak
Here is what I do, you can compare if the value of the node and its attribute matches your criteria then you accepts it:
// basically I am looking for "settings" node then "network" subnode, then "port" attribute
if( boost::iequals(doc.first_node()->next_sibling()->name(), "settings"))
{
for (xml_node<> *node = doc.first_node()->next_sibling()->first_node(); node; node = node->next_sibling())
{
// find network tag
if (boost::iequals(node->name(),"network"))
{
for (xml_attribute<> *attr = node->first_attribute(); attr; attr = attr->next_attribute())
{
if ( boost::iequals(attr->name(), "port"))
{
strcpy(attr->value(), portname);
}
}
}
}
}

Read XML node with RapidXML

I'm using RapidXML to parse XML files and read nodes content but I don't want to read values inside a node, I need to read the content of specific XML nodes "as XML" not as parsed values.
Example :
<node1>
<a_lot_of_xml>
< .... >
</a_lot_of_xml>
</node1>
I need to get the content of node1 as :
<a_lot_of_xml>
< .... >
</a_lot_of_xml>
What I tired :
I tried something but its not really good in my opinion, its about to put in node1, the path of an other xml file to read, I did like this :
<file1ToRead>MyFile.xml</file1ToRead>
And then my c++ code is the following :
ifstream file(FileToRead);
stringstream buffer; buffer << file.rdbuf();
But the problem is users will have a lot of XML files to maintain and I just want to use one xml file.
I think "a lot of XML files" is a better way, so you have a directory of all xml files, you can read the xml file when you need it, good for performance.
Back to the problem, can use the rapidxml::print function to get the xml format.
bool test_analyze_xml(const std::string& xml_path)
{
try
{
rapidxml::file<> f_doc(xml_path.c_str());
rapidxml::xml_document<> xml_doc;
xml_doc.parse<0>(const_cast<char*>(f_doc.data()));
rapidxml::xml_node<>* node_1 = xml_doc.first_node("node1");
if(node_1 == NULL)
{
return false;
}
rapidxml::xml_node<>* plain_txt = node_1->first_node("a_lot_of_xml");
if (plain_txt == NULL)
{
return false;
}
std::string xml_data;
rapidxml::print(std::back_inserter(xml_data), *plain_txt, rapidxml::print_no_indenting); //the xml_data is XML format.
}
catch (...)
{
return false;
}
return true;
}
I'm unfamiliar with rapidxml, but I have done this with tinyxml2. The trick is to read out node1 and then create a new XMLDoc (using tinyxml2 terms here) that contains everything inside of node1. From there, you can use their XMLPrinter class to convert your new XMLDoc (containing everything in node1) to a string.
tinyxml2 is a free download.