What is the alternative to `TiXmlNode::FirstChild(const char *)` in TinyXML-2? - c++

I am updating code that uses the legacy TinyXml library, to use new TinyXML-2 version instead.
While editing, I noticed that the function TiXmlNode::FirstChild(const char *) has no direct replacement in TinyXML-2.
My questions are:
Is there a convenient replacement for the aforementioned function that I missed?
In case there isn't, how should the example code below be updated for TinyXML-2?
// TiXmlElement *element; // assume this was correctly loaded
TiXmlNode *node;
if ((node = element->FirstChild("example")) != nullptr)
{
for (TiXmlElement *walk = node->FirstChildElement();
walk != nullptr;
walk = walk->NextSiblingElement())
{
// ...
}
}

tinyxml2 has
const XMLElement * XMLNode::FirstChildElement (const char *value=0) const
Your code block is much the same:
if (auto example = element -> FirstChildElement ("example")
{
for (auto walk = example -> FirstChildElement();
walk;
walk -> NextSiblingElement())
{
// walk the walk
}
}
Or you might look at my add-on for tinyxml2 with which your snippet would be:
for (auto walk : selection (element, "example/")
{
// walk the walk
}

Related

Copy only necessary objects from PDF file

I've got a huge PDF file with more than 100 pages and I want to separate them to single PDF files (containing only one page each). Problem is, that PoDoFo does not copy just the page, but the whole document because of the references (and so each of the 100 PDF files have same size as the 100-page PDF). A relevant mailing list post can be found, unfortunately there is no solution provided.
In source code of function InsertPages there is explanation:
This function works a bit different than one might expect.
Rather than copying one page at a time - we copy the ENTIRE document
and then delete the pages we aren't interested in.
We do this because
1) SIGNIFICANTLY simplifies the process
2) Guarantees that shared objects aren't copied multiple times
3) offers MUCH faster performance for the common cases
HOWEVER: because PoDoFo doesn't currently do any sort of "object
garbage collection" during a Write() - we will end up with larger
documents, since the data from unused pages will also be in there.
I have tried few methods to copy only relevant objects, but each of them failed.
Copy all pages and remove irrelevant ones
Use XObject wrapping: FillXObjectFromDocumentPage and FillXObjectFromExistingPage
Copy object by object
Use RenumberObjects with bDoGarbageCollection = true
but none of them worked out. Does anybody have an idea or working solution for this problem?
The only solution is to use another PDF library. Or wait for garbage collection to be implemented.
The problem is stated in the quote you mentioned:
> during a Write() - we will end up with larger documents, since the
> data from unused pages will also be in there.
This means podofo always puts the entire PDF content in your file, no matter what. The whole PDF is there, you just don't see parts of it.
Dennis from the podofo support sent me an working example of optimized version of InsertPages function which is actually fixing page references and decreases document size significantly!
void PdfMemDocument::InsertPages2(const PdfMemDocument & rDoc, std::vector<int> pageNumbers)
{
std::unordered_set<PdfObject*> totalSet;
std::vector<pdf_objnum> oldObjNumPages;
std::unordered_map<pdf_objnum, pdf_objnum> oldObjNumToNewObjNum;
std::vector<PdfObject*> newPageObjects;
// Collect all dependencies from all pages that are to be copied
for (int i = 0; i < pageNumbers.size(); ++i) {
PdfPage* page = rDoc.GetPage(pageNumbers[i]);
if (page) {
oldObjNumPages.push_back(page->GetObject()->Reference().ObjectNumber());
std::unordered_set<PdfObject*> *set = page->GetPageDependencies();
totalSet.insert(set->begin(), set->end());
delete set;
}
}
// Create a new page object for every copied page from the old document
// Copy all objects the pages depend on to the new document
for (auto it = totalSet.begin(); it != totalSet.end(); ++it) {
unsigned int length = static_cast<unsigned int>(GetObjects().GetSize() + GetObjects().GetFreeObjects().size());
PdfReference ref(static_cast<unsigned int>(length+1), 0);
PdfObject* pObj = new PdfObject(ref, *(*it));
pObj->SetOwner(&(GetObjects()));
if ((*it)->HasStream()) {
PdfStream *stream = (*it)->GetStream();
pdf_long length;
char* buf;
stream->GetCopy(&buf, &length);
PdfMemoryInputStream inputStream(buf, length);
pObj->GetStream()->SetRawData(&inputStream, length);
free(buf);
}
oldObjNumToNewObjNum.insert(std::pair<pdf_objnum, pdf_objnum>((*it)->Reference().ObjectNumber(), length+1));
GetObjects().push_back(pObj);
newPageObjects.push_back(pObj);
}
// In all copied objects, fix the object numbers so they are valid in the new document
for (auto it = newPageObjects.begin(); it != newPageObjects.end(); ++it) {
FixPageReferences(GetObjects(), *it, oldObjNumToNewObjNum);
}
// Insert the copied pages into the pages tree
for (auto it = oldObjNumPages.begin(); it != oldObjNumPages.end(); ++it) {
PdfObject* pageObject = GetObjects().GetObject(PdfReference(oldObjNumToNewObjNum[(*it)], 0));
PdfPage *page = new PdfPage(pageObject, std::deque<PdfObject*>());
GetPagesTree()->InsertPage(GetPageCount() - 1, page);
}
}
std::unordered_set<PdfObject *>* PdfPage::GetPageDependencies() const
{
std::unordered_set<PdfObject *> *set = new std::unordered_set<PdfObject *>();
const PdfObject* pageObj = GetObject();
if (pageObj) {
PdfVecObjects* objects = pageObj->GetOwner();
if (objects) {
set->insert((PdfObject*)pageObj);
objects->GetObjectDependencies2(pageObj, *set);
}
}
return set;
}
// Optimized version of PdfVecObjects::GetObjectDependencies
void PdfVecObjects::GetObjectDependencies2(const PdfObject* pObj, std::unordered_set<PdfObject*> &refMap) const
{
// Check objects referenced from this object
if (pObj->IsReference())
{
PdfObject* referencedObject = GetObject(pObj->GetReference());
if (referencedObject != NULL && refMap.count(referencedObject) < 1) {
(refMap).insert((PdfObject *)referencedObject); // Insert referenced object
GetObjectDependencies2((const PdfObject*)referencedObject, refMap);
}
}
else {
// Recursion
if (pObj->IsArray())
{
PdfArray::const_iterator itArray = pObj->GetArray().begin();
while (itArray != pObj->GetArray().end())
{
GetObjectDependencies2(&(*itArray), refMap);
++itArray;
}
}
else if (pObj->IsDictionary())
{
TCIKeyMap itKeys = pObj->GetDictionary().GetKeys().begin();
while (itKeys != pObj->GetDictionary().GetKeys().end())
{
if ((*itKeys).first != PdfName("Parent")) {
GetObjectDependencies2((*itKeys).second, refMap);
}
++itKeys;
}
}
}
}
void FixPageReferences(PdfVecObjects& objects, PdfObject* pObject, std::unordered_map<pdf_objnum, pdf_objnum>& oldNumToNewNum) {
if( !pObject)
{
PODOFO_RAISE_ERROR( ePdfError_InvalidHandle );
}
if( pObject->IsDictionary() )
{
TKeyMap::iterator it = pObject->GetDictionary().GetKeys().begin();
while( it != pObject->GetDictionary().GetKeys().end() )
{
if ((*it).first != PdfName("Parent")) {
FixPageReferences(objects, (*it).second, oldNumToNewNum);
}
++it;
}
}
else if( pObject->IsArray() )
{
PdfArray::iterator it = pObject->GetArray().begin();
while( it != pObject->GetArray().end() )
{
FixPageReferences(objects, &(*it), oldNumToNewNum),
++it;
}
}
else if( pObject->IsReference() )
{
//PdfObject* referencedObj = objects.GetObject(pObject->GetReference());
pdf_objnum oldnum = pObject->GetReference().ObjectNumber();
pdf_objnum newnum = oldNumToNewNum[oldnum];
if (!newnum) throw new std::exception("No new object number for old object number");
*pObject = PdfReference(newnum, 0);
}
}

GOF Composite Design Pattern CompositeObject::Remove Recursive Implementation in C++

This is the part of question from my question asked in codereview website:
GOF Composite Design Pattern Implementation Using Modern C++
The post has complete information/implementation about it but here I am posting this question to understand about the following information:
How to implement CompositeEquipment::Remove?.
Based on my understanding, it should do recursive search in all composite object in which client has invoked and recursively all its child objects which can also be of composite type. Just to illustrate from above implementation, if client write the as cabinet->Remove(bus); it would not remove bus object as it is the child of chassis object. This seems to be incorrect to me. However I am not able to implement the CompositeEquipment::Remove in such a way that it searches recursively if child objects themselves are of composite.
So far I have came of with the following implementation which just searches the composite objects which client has involved for Remove method.
//To find out whether items are in the composite objects
class Name_Equal {
private:
Equipment::EquipmentSmartPtr val;
public:
Name_Equal(const Equipment::EquipmentSmartPtr& v) :val(v) { }
bool operator()(const Equipment::EquipmentSmartPtr& x) const {
return (x->Name() == val->Name());
}
};
void CompositeEquipment::Remove(EquipmentSmartPtr entry) {
find_equipment(_equipment, entry);
}
void CompositeEquipment::find_equipment(std::vector<EquipmentSmartPtr>& vec,
EquipmentSmartPtr& entry){
Name_Equal eq(entry);
auto itrpos = std::find_if(std::begin(vec), std::end(vec), eq);
if (itrpos != std::end(vec)) {
vec.erase(itrpos);
}
}
Kindly let me know in case any additional information or complete code needs to post here as well.
There are two options:
Provide a virtual function Remove in the base class and make it a noop implementation. Then add a few more lines to CompositeEquipment::find_equipment.
void CompositeEquipment::find_equipment(std::vector<EquipmentSmartPtr>& vec,
EquipmentSmartPtr& entry){
Name_Equal eq(entry);
auto itrpos = std::find_if(std::begin(vec), std::end(vec), eq);
if (itrpos != std::end(vec)) {
vec.erase(itrpos);
} else {
for ( EquipmentSmartPtr sptr : vec )
{
sptr->Remove(entry);
}
}
}
Use dynamic_cast to determine whether an item of the composite is a composite also. If so, call Remove on it. I prefer this option.
void CompositeEquipment::find_equipment(std::vector<EquipmentSmartPtr>& vec,
EquipmentSmartPtr& entry){
Name_Equal eq(entry);
auto itrpos = std::find_if(std::begin(vec), std::end(vec), eq);
if (itrpos != std::end(vec)) {
vec.erase(itrpos);
} else {
for ( EquipmentSmartPtr sptr : vec )
{
Equipment* ptr = dynamic_cast<Equipment*>(sptr.get());
if ( ptr )
{
ptr->Remove(entry);
}
}
}
}
A bit about names... find_equipment seems a strange name for the function. I would put the whole thing in Remove.
void CompositeEquipment::Remove(EquipmentSmartPtr& entry){
std::vector<EquipmentSmartPtr>& vec = _equipment;
Name_Equal eq(entry);
auto itrpos = std::find_if(std::begin(vec), std::end(vec), eq);
if (itrpos != std::end(vec)) {
vec.erase(itrpos);
} else {
for ( EquipmentSmartPtr sptr : vec )
{
Equipment* ptr = dynamic_cast<Equipment*>(sptr.get());
if ( ptr )
{
ptr->Remove(entry);
}
}
}
}

Attempting to read nodes in Rapid XML resulting in error

I have a class in my program that uses Rapid XML to write data to file. This process works fine. However when I attempt to read the same data, my program will always be halted by internal error catching code, explaining "next sibling returned NULL but attempted to read value anyways".
if (xmlFile.good())
{
vector<char> buffer((istreambuf_iterator<char>(xmlFile)), istreambuf_iterator<char>());
buffer.push_back('\0');
doc.parse<0>(&buffer[0]);
root_node = doc.first_node("CityData");
for(xml_node<> * bound_node = root_node->first_node("Boundaries"); bound_node; bound_node = bound_node->next_sibling())
{
if (bound_node->first_attribute("enabled")->value() != NULL)
{
int enabled = atoi(bound_node->first_attribute("enabled")->value());
if (enabled == 1)
boundaries = true; // Program globals
}
}
if (boundaries)
{
for(xml_node<> * dimen_node = root_node->first_node("Dimensions"); dimen_node; dimen_node = dimen_node->next_sibling())
{
cityDim.x = atoi(dimen_node->first_attribute("x-val")->value()); // Program globals
cityDim.y = atoi(dimen_node->first_attribute("y-val")->value());
}
}
An example of how the data appears in the XML file:
<CityData version="1.0" type="Example">
<Boundaries enabled="1"/>
<Dimensions x-val="1276" y-val="688"/>
If I add a breakpoint before the either loop attempts to reiterate and look at the values, we can see they are read from the first iteration, however the end criteria for the loop appears to be incorrect and upon next_sibling() the error occurs. I cannot understand the issue, as the code for the loop was copied from an example completely unmodified (aside from variable renaming) and appears correct to me (modifying it to node != NULL) does not help.
In the bound_node-loop the variable bound_node first points to <Boundaries enabled="1"> and you are able to read the attribute with the name enabled. After the call to next_sibling(), bound_node points to <Dimensions .../> and the call to first_attribute("enabled") will return a null pointer because this xml-node does not have an attribute with this name and the subsequent call to value() will cause the program to crash.
I do not understand why you are writing a loop over all nodes. If the xml-file looks like this
<CityData version="1.0" type="Example">
<Boundaries enabled="1"/>
<Dimensions x-val="1276" y-val="688"/>
</CityData>
Then you can extract the values like this:
xml_node<> const * bound_node = root_node->first_node("Boundaries");
if (bound_node)
{
xml_attribute<> const * attr_enabled = bound_node->first_attribute("enabled");
if (attr_enabled)
{
int enabled = atoi(attr_enabled->value());
if (enabled == 1)
boundaries = true;
}
}
if (boundaries)
{
xml_node<> const * dimen_node = root_node->first_node("Dimensions");
if (dimen_node)
{
xml_attribute<> const * xval = dimen_node->first_attribute("x-val");
xml_attribute<> const * yval = dimen_node->first_attribute("y-val");
if (xval && yval)
{
cityDim.x = atoi(xval->value());
cityDim.y = atoi(yval->value());
}
}
}
}
You can write some else-clauses to signal errors, of course.

libxml2 xpath parsing, doesn't work as expected

I decided to use libxml2 parser for my qt application and im stuck on xpath expressions. I found an example class and methods, and modified this a bit for my needs. The code
QStringList* LibXml2Reader::XPathParsing(QXmlInputSource input)
{
xmlInitParser();
xmlDocPtr doc;
xmlXPathContextPtr xpathCtx;
xmlXPathObjectPtr xpathObj;
QStringList *valList =NULL;
QByteArray arr = input.data().toUtf8(); //convert input data to utf8
int length = arr.length();
const char* data = arr.data();
doc = xmlRecoverMemory(data,length); // build a tree, ignoring the errors
if(doc == NULL) { return NULL;}
xpathCtx = xmlXPathNewContext(doc);
if(xpathCtx == NULL)
{
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
xpathObj = xmlXPathEvalExpression(BAD_CAST "//[#class='b-domik__nojs']", xpathCtx); //heres the parsing fails
if(xpathObj == NULL)
{
xmlXPathFreeContext(xpathCtx);
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
xmlNodeSetPtr nodes = xpathObj->nodesetval;
int size = (nodes) ? nodes->nodeNr : 0;
if(size==0)
{
xmlXPathFreeContext(xpathCtx);
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
valList = new QStringList();
for (int i = 0; i < size; i++)
{
xmlNodePtr current = nodes->nodeTab[i];
const char* str = (const char*)current->content;
qDebug() << "name: " << QString::fromLocal8Bit((const char*)current->name);
qDebug() << "content: " << QString::fromLocal8Bit((const char*)current->content) << "\r\n";
valList->append(QString::fromLocal8Bit(str));
}
xmlXPathFreeObject(xpathObj);
xmlXPathFreeContext(xpathCtx);
xmlFreeDoc(doc);
xmlCleanupParser();
return valList;
}
As an example im making a request to http://yandex.ru/ and trying to get the node with class b-domik__nojs which is basically one div.
xpathObj = xmlXPathEvalExpression(BAD_CAST "//[#class='b-domik__nojs']", xpathCtx); //heres the parsing fails
the problem is the expression //[#class='b-domik__nojs'] doesn't work at all. I checked it in firefox xpath ext., and in opera developer tools xpath ext. in there this expression works perfectly.
I also tried to get other nodes with attributes but for some reason xpath for ANY attribute fails. Is there something wrong in my method? Also when i load a tree using xmlRecover, it gives me a lot of parser errors in debug output.
Ok i played a bit with my libxml2 function more and used "//*" expression to get all elements in the document, but! It returns me only the elements in the first children node of the body tag. This is the yandex.ru dom tree
so basically it gets ALL the elements in the first div "div class="b-line b-line_bar", but doesnt look for the other elements in other child nodes of the <body> for some reason.
Why can that happen? Maybe xmlParseMemory doesnt build a full tree for some reason? Is there any possible solution to fix this.
It is really strange that the expression works anywhere, because it is not a valid XPath expression. After the axis specification (//), there should be a nodetest (element name or *) before the predicate (the condition in square brackets).
//*[#class='bdomik__nojs']
Allright it works now, if my mistake was to use xml functions to make html documents into a tree. I used htmlReadMemory and the tree is fully built now. Some code again
xmlInitParser();
xmlDocPtr doc;
xmlXPathContextPtr xpathCtx;
xmlXPathObjectPtr xpathObj;
QByteArray arr = input.data().toUtf8();
int length = arr.length();
const char* data = arr.data();
doc = htmlReadMemory(data,length,"",NULL,HTML_PARSE_RECOVER);
if(doc == NULL) { return NULL;}
xpathCtx = xmlXPathNewContext(doc);
if(xpathCtx == NULL)
{
xmlFreeDoc(doc);
xmlCleanupParser();
return NULL;
}
xpathObj = xmlXPathEvalExpression(BAD_CAST "//*[#class='b-domik__nojs']", xpathCtx);
etc.

Need help in solving segmentation fault In Scanner/Lexer code part in C++

The segmentation fault is caused in the scanner code.
The Problem:
Using GDB to backtrack reveals that the problem is caused with the declaration of the FieldInfo Pointer named field_info (where FieldInfo is a struct) in the condition: if (tell_me).
Please note the the following code is a part of a large file, so if there are some things whose declaration is not here, you can probably assume that they would have been defined in the program somewhere else and not shown here.
The sample code:
Some_function(some_arguments) {
// Did something.
if (flag_1) {
list<const FieldInfo *> prefix_stack;
const FieldInfo def_pfx(NON_BOOLEAN, default_prefix);
{
const FieldInfo * default_field_info = &def_pfx;
if (default_prefix.empty()) {
map<string, FieldInfo>::const_iterator f = field_map.find("");
if (f != field_map.end()) default_field_info = &(f->second);
}
// We always have the current prefix on the top of the stack.
prefix_stack.push_back(default_field_info);
}
// Did something.
for (<some conditions>) {
bool tell_me = false;
// Did something.
if (tell_me) {
const FieldInfo pos_prefix(NON_BOOLEAN, pos);
const FieldInfo * field_info = &pos_prefix;
Term * term_obj = new Term(&state, term_lowercase, field_info,
term, stem_term, term_pos++);
Parse(pParser, token, term_obj, &state);
} else {
const FieldInfo * field_info = prefix_stack.back();
Term * term_obj = new Term(&state, term_lowercase, field_info,
term, stem_term, term_pos++);
Parse(pParser, token, term_obj, &state);
}
// Did something.
}
}
// Did something.
}
And the definition of FieldInfo is:
struct FieldInfo {
/// The type of this field.
filter_type type;
/// Field prefix strings.
list<string> prefixes;
/// Field processors struct already defined earlier.
list<FieldProcessor*> procs;
FieldInfo(filter_type type_, const string & prefix)
: type(type_)
{
prefixes.push_back(prefix);
}
FieldInfo(filter_type type_, FieldProcessor *proc)
: type(type_)
{
procs.push_back(proc);
}
};
Analysis:
Parse is a method that calls the Parser.
GDB reveals that the problem (segmentation fault) is caused when the Parser tries to process the field_info by iterating over the field_info->prefixes.
EDIT:
Here is the code of the function where the segmentation fault occurs (I have added some cout for the debugging purposes). The problem comes is in the while (++piter != prefixes.end()) part of code:
Query get_query() const
{
const list<string> & prefixes = field_info->prefixes;
if (prefixes.empty()) {
assert(!field_info->procs.empty());
return (**field_info->procs.begin())(name);
}
list<string>::const_iterator piter = prefixes.begin();
Query q(make_term(*piter), 1, pos);
while (++piter != prefixes.end()) {
string check3 = make_term(*piter);
Query q2(check3, 1, pos);
q = Query(Query::OP_OR, q, q2);
}
return q;
}
NOTE:
I am working on some-one else's working code.
I have added the if(flag_1) part of code, and rest everything else was there already.
This part looks suspicious:
const FieldInfo pos_prefix(NON_BOOLEAN, pos);
const FieldInfo * field_info = &pos_prefix;
Term * term_obj = new Term(&state, term_lowercase, field_info,
term, stem_term, term_pos++);
You are using the address of the local variable pos_prefix to initialize term_obj. You have to make absolutely sure that this address is never accessed after pos_prefix has gone out of scope, because then the address will be invalid.
You are having an awful lot of raw pointers in your code. This is not good practice in modern C++. Consider using plain objects, references or smart-pointers.