Xerces, xpaths, and XML namespaces

Xerces, xpaths, and XML namespaces - c++

I'm trying to use xerces-c in order to parse a rather massive XML document generated from StarUML in order to change some things, but I'm running into issues getting the xpath query to work because it keeps crashing.
To simplify things I split out part of the file into a smaller XML file for testing, which looks like this:
<?xml version="1.0" encoding="utf-8"?>
<XPD:UNIT xmlns:XPD="http://www.staruml.com" version="1">
<XPD:HEADER>
<XPD:SUBUNITS>
</XPD:SUBUNITS>
</XPD:HEADER>
<XPD:BODY>
<XPD:OBJ name="Attributes[3]" type="UMLAttribute" guid="onMjrHQ0rUaSkyFAWtLzKwAA">
<XPD:ATTR name="StereotypeName" type="string">ConditionInteraction</XPD:ATTR>
</XPD:OBJ>
</XPD:BODY>
</XPD:UNIT>
All I'm trying to do for this example is to find all of the XPD:OBJ elements, of which there is only one. The problem seems to stem from trying to query with the namespace. When I pass a very simple xpath query of XPD:OBJ it will crash, but if I pass just OBJ it won't crash but it won't find the XPD:OBJ element.
I assume there's some important property or setting that I'm missing during initialization that I need to set but I have no idea what it might be. I looked up all of the properties of the parser having to do with namespace and enabled the ones I could but it didn't help at all so I'm completely stuck. The initialization code looks something like this, with lots of things removed obviously:
const tXercesXMLCh tXMLManager::kDOMImplementationFeatures[] =
{
static_cast<tXercesXMLCh>('L'),
static_cast<tXercesXMLCh>('S'),
static_cast<tXercesXMLCh>('\0')
};
// Instantiate the DOM parser.
fImplementation = static_cast<tXercesDOMImplementationLS *>(tXercesDOMImplementationRegistry::getDOMImplementation(kDOMImplementationFeatures));
if (fImplementation != nullptr)
{
fParser = fImplementation->createLSParser(tXercesDOMImplementationLS::MODE_SYNCHRONOUS, nullptr);
fConfig = fParser->getDomConfig();
// Let the validation process do its datatype normalization that is defined in the used schema language.
//fConfig->setParameter(tXercesXMLUni::fgDOMDatatypeNormalization, true);
// Ignore comments and whitespace so we don't get extra nodes to process that just waste time.
fConfig->setParameter(tXercesXMLUni::fgDOMComments, false);
fConfig->setParameter(tXercesXMLUni::fgDOMElementContentWhitespace, false);
// Setup some properties that look like they might be required to get namespaces to work but doesn't seem to help at all.
fConfig->setParameter(tXercesXMLUni::fgXercesUseCachedGrammarInParse, true);
fConfig->setParameter(tXercesXMLUni::fgDOMNamespaces, true);
fConfig->setParameter(tXercesXMLUni::fgDOMNamespaceDeclarations, true);
// Install our custom error handler.
fConfig->setParameter(tXercesXMLUni::fgDOMErrorHandler, &fErrorHandler);
}
Then later on I parse the document, find the root node, and then run the xpath query to find the node I want. I'll leave out the bulk of that and just show you where I'm running the xpath query in case there's something obviously wrong there:
tXercesDOMDocument * doc; // Comes from parsing the file.
tXercesDOMNode * contextNode; // This is the root node retrieved from the document.
tXercesDOMXPathResult * xPathResult;
doc->evaluate("XPD:OBJ", contextNode, nullptr, tXercesDOMXPathResult::ORDERED_NODE_SNAPSHOT_TYPE), xPathResult);
The call to evaluate() is where it crashes somewhere deep inside xerces that I can't see very clearly, but from what I can see there are a lot of things that look deleted or uninitialized so I'm not sure what's causing the crash exactly.
So is there anything here that looks obviously wrong or missing that is required to make xerces work with XML namespaces?

The solution was right in front of my face the whole time. The problem was that you need to create and pass a resolver to the evaluate() call or else it will not be able to figure out any of the namespaces and will throw an exception. The crash seems to be a bug in xerces since it's crashing on trying to throw the exception when it can't resolve the namespace. I had to debug deep into the xerces code to find it, which gave me the solution.
So to fix the problem I changed the call to evaluate() slightly to create a resolver with the root node and now it works perfectly:
tXercesDOMDocument * doc; // Comes from parsing the file.
tXercesDOMNode * contextNode; // This is the root node retrieved from the document.
tXercesDOMXPathResult * xPathResult;
// Create the resolver with the root node, which contains the namespace definition.
tXercesDOMXPathNSResolver * resolver(doc->createNSResolver(contextNode));
doc->evaluate("XPD:OBJ", contextNode, resolver, tXercesDOMXPathResult::ORDERED_NODE_SNAPSHOT_TYPE), xPathResult);
// Make sure to release the resolver since anything created from a `create___()`
// function has to be manually released.
resolver->release();

Related

Get ScriptOrigin from v8::Module

It seems trivial, but I've searched far and wide.
I'm using this resource to make v8 run with ES Modules and I'm trying to implement my own search/load algorithm. Thus far, I've managed to make a simple system which loads a file from a known location, however I'd like to implement external modules. This means that the known location is actually unknown throughout the application. Take the following directory tree as an example:
~/
- index.js
import 'module1_index'; // This is successfully resolved to /libs/module1/module1_index.js
/libs/module1/
- module1_index.js
export * from './lib.js' // This import fails because it is looking for ./lib.js in ~/source
- lib.js
export /* literally anything */
The above example begins by executing the index.js file from ~. When module1_index.js is executed, lib.js is looked for from ~ and consequently fails. In order to address this, the files must be looked for relative to the file being executed at the moment, however I have not found a means to do this.
First Attempt
I'm given the opportunity to look for the file in the callResolve method (main.cpp:280):
v8::MaybeLocal<v8::Module> callResolve(v8::Local<v8::Context> context, v8::Local<v8::String> specifier, v8::Local<v8::Module> referrer)
or in loadModule (main.cpp:197)
v8::MaybeLocal<v8::Module> loadModule(char code[], char name[], v8::Local<v8::Context> cx)
however, as mentioned, I have found no function by which to extract the ScriptOrigin from the module. I should mention, when files are successfully resolved, the ScriptOrigin is initiated with the exact path to the file, and is reliable.
Second Attempt
I set up a stack, which keeps track of the current file being executed. Every import which is made is pushed onto the stack. Once the file has finished executing, it is popped. This also did not work, as there was no way to reliably determine once the file had finished executing.
It seems that the loadModule function does just that: loads. It does not execute, so I cannot pop after the module has loaded, as the imports are not fully resolved. The checkModule/execModule functions are only invoked on dynamic imports, making them useless to determining the completion of a static import.
I'm at a loss. I'm not familiar with v8 enough to know where to look, although I have dug through some NodeJS source code looking for an implementation, to no avail.
Any pointers are greatly appreciated.
Thanks.
Jake.

I don't know much about module resolution, but looking at V8's sources, I can see an example mapping a v8::Module to a std::string absolute_path, which sounds like what you're looking for. I'm not copying the whole code here, because the way it uses custom metadata is a bit involved; the short story is that it keeps a std::unordered_map to keep data about each module's source on the side. (I wonder if it would be possible to use Module::ScriptId() as that map's key, for simplification.)
Code search finds a bunch more example uses of InstantiateModule, mostly in tests. Tests often serve as useful examples/documentation :-)

list package `MoveToFront` not working for me

New to Go and building a simple LRU cache in Go to get used to syntax and Go development.
Having an issue with the MoveToFront list method, it fails on the following check in the MoveToFront body
if e.list != l || l.root.next == e
I want to move the element (e) to the front of the list when I retrieve it from cache , like this
if elem, ok := lc.entries[k]; ok {
lc.list.MoveToFront(elem) // needs fixing
return elem
}
return nil
The Code can be seen here on line 32 the issue occurs
https://github.com/hajjboy95/golrucache/blob/master/lru_cache/lrucache.go#L32

There seem to be two problems, to me. First, this isn't how the List data type is meant to be used: lc.list.PushFront() will create a List.Element and return a pointer to it. That's not fatal, but at the least, it is kind of annoying—the caller has to dig through the returned List.Element when using Get, instead of just getting the value.
Meanwhile, presumably the failure you see is because you remove elements in Put when the LRU-list runs out of space, but you don't remove them from the corresponding map. Hence a later Put of the just-removed key will try to re-use the element in place, even though the element was removed from the list. To fix this, you'll need to hold both key and value. (In my simple experiment I did not see any failures here, but the problem became clear enough.)
I restructured the code somewhat and turned it into a working example on the Go Playground. I make no promises as to suitability, etc.

c++ best way to realise global switches/flags to control program behaviour without tying the classes to a common point

Let me elaborate on the title:
I want to implement a system that would allow me to enable/disable/modify the general behavior of my program. Here are some examples:
I could switch off and on logging
I could change if my graphing program should use floating or pixel coordinates
I could change if my calculations should be based upon some method or some other method
I could enable/disable certain aspects like maybe a extension api
I could enable/disable some basic integrated profiler (if I had one)
These are some made-up examples.
Now I want to know what the most common solution for this sort of thing is.
I could imagine this working with some sort of singelton class that gets instanced globally or in some other globally available object. Another thing that would be possible would be just constexpr or other variables floating around in a namespace, again globally.
However doing something like that, globally, feels like bad practise.
second part of the question
This might sound like I cant decide what I want, but I want a way to modify all these switches/flags or whatever they are actually called in a single location, without tying any of my classes to it. I don't know if this is possible however.
Why don't I want to do that? Well I like to make my classes somewhat reusable and I don't like tying classes together, unless its required by the DRY principle and or inheritance. I basically couldn't get rid of the flags without modifying the possible hundreds of classes that used them.
What I have tried in the past
Having it all as compiler defines. This worked reasonably well, however I didnt like that I couldnt make it so if the flag file was gone there were some sort of default settings that would make the classes themselves still operational and changeable (through these default values)
Having it as a class and instancing it globally (system class). Worked ok, however I didnt like instancing anything globally. Also same problem as above
Instancing the system class locally and passing it to the classes on construction. This was kinda cool, since I could make multiple instruction sets. However at the same time that kinda ruined the point since it would lead to things that needed to have one flag set the same to have them set differently and therefore failing to properly work together. Also passing it on every construction was a pain.
A static class. This one worked ok for the longest time, however there is still the problem when there are missing dependencies.
Summary
Basically I am looking for a way to have a single "place" where I can mess with some values (bools, floats etc.) and that will change the behaviour of all classes using them for whatever, where said values either overwrite default values or get replaced by default values if said "place" isnt defined.

If a Singleton class does not work for you , maybe using a DI container may fit in your third approach? It may help with the construction and make the code more testable.
There are some DI frameworks for c++, like https://github.com/google/fruit/wiki or https://github.com/boost-experimental/di which you can use.

If you decide to use switch/flags, pay attention for "cyclometric complexity".
If you do not change the skeleton of your algorithm but only his behaviour according to the objets in parameter, have a look at "template design pattern". This method allow you to define a generic algorithm and specify particular step for a particular situation.

Here's an approach I found useful; I don't know if it's what you're looking for, but maybe it will give you some ideas.
First, I created a BehaviorFlags.h file that declares the following function:
// Returns true iff the given feature/behavior flag was specified for us to use
bool IsBehaviorFlagEnabled(const char * flagName);
The idea being that any code in any of your classes could call this function to find out if a particular behavior should be enabled or not. For example, you might put this code at the top of your ExtensionsAPI.cpp file:
#include "BehaviorFlags.h"
static const enableExtensionAPI = IsBehaviorFlagEnabled("enable_extensions_api");
[...]
void DoTheExtensionsAPIStuff()
{
if (enableExtensionsAPI == false) return;
[... otherwise do the extensions API stuff ...]
}
Note that the IsBehaviorFlagEnabled() call is only executed once at program startup, for best run-time efficiency; but you also have the option of calling IsBehaviorFlagEnabled() on every call to DoTheExtensionsAPIStuff(), if run-time efficiency is less important that being able to change your program's behavior without having to restart your program.
As far as how the IsBehaviorFlagEnabled() function itself is implemented, it looks something like this (simplified version for demonstration purposes):
bool IsBehaviorFlagEnabled(const char * fileName)
{
// Note: a real implementation would find the user's home directory
// using the proper API and not just rely on ~ to expand to the home-dir path
std::string filePath = "~/MyProgram_Settings/";
filePath += fileName;
FILE * fpIn = fopen(filePath.c_str(), "r"); // i.e. does the file exist?
bool ret = (fpIn != NULL);
fclose(fpIn);
return ret;
}
The idea being that if you want to change your program's behavior, you can do so by creating a file (or folder) in the ~/MyProgram_Settings directory with the appropriate name. E.g. if you want to enable your Extensions API, you could just do a
touch ~/MyProgram_Settings/enable_extensions_api
... and then re-start your program, and now IsBehaviorFlagEnabled("enable_extensions_api") returns true and so your Extensions API is enabled.
The benefits I see of doing it this way (as opposed to parsing a .ini file at startup or something like that) are:
There's no need to modify any "central header file" or "registry file" every time you add a new behavior-flag.
You don't have to put a ParseINIFile() function at the top of main() in order for your flags-functionality to work correctly.
You don't have to use a text editor or memorize a .ini syntax to change the program's behavior
In a pinch (e.g. no shell access) you can create/remove settings simply using the "New Folder" and "Delete" functionality of the desktop's window manager.
The settings are persistent across runs of the program (i.e. no need to specify the same command line arguments every time)
The settings are persistent across reboots of the computer
The flags can be easily modified by a script (via e.g. touch ~/MyProgram_Settings/blah or rm -f ~/MyProgram_Settings/blah) -- much easier than getting a shell script to correctly modify a .ini file
If you have code in multiple different .cpp files that needs to be controlled by the same flag-file, you can just call IsBehaviorFlagEnabled("that_file") from each of them; no need to have every call site refer to the same global boolean variable if you don't want them to.
Extra credit: If you're using a bug-tracker and therefore have bug/feature ticket numbers assigned to various issues, you can creep the elegance a little bit further by also adding a class like this one:
/** This class encapsulates a feature that can be selectively disabled/enabled by putting an
* "enable_behavior_xxxx" or "disable_behavior_xxxx" file into the ~/MyProgram_Settings folder.
*/
class ConditionalBehavior
{
public:
/** Constructor.
* #param bugNumber Bug-Tracker ID number associated with this bug/feature.
* #param defaultState If true, this beheavior will be enabled by default (i.e. if no corresponding
* file exists in ~/MyProgram_Settings). If false, it will be disabled by default.
* #param switchAtVersion If specified, this feature's default-enabled state will be inverted if
* GetMyProgramVersion() returns any version number greater than this.
*/
ConditionalBehavior(int bugNumber, bool defaultState, int switchAtVersion = -1)
{
if ((switchAtVersion >= 0)&&(GetMyProgramVersion() >= switchAtVersion)) _enabled = !_enabled;
std::string fn = defaultState ? "disable" : "enable";
fn += "_behavior_";
fn += to_string(bugNumber);
if ((IsBehaviorFlagEnabled(fn))
||(IsBehaviorFlagEnabled("enable_everything")))
{
_enabled = !_enabled;
printf("Note: %s Behavior #%i\n", _enabled?"Enabling":"Disabling", bugNumber);
}
}
/** Returns true iff this feature should be enabled. */
bool IsEnabled() const {return _enabled;}
private:
bool _enabled;
};
Then, in your ExtensionsAPI.cpp file, you might have something like this:
// Extensions API feature is tracker #4321; disabled by default for now
// but you can try it out via "touch ~/MyProgram_Settings/enable_feature_4321"
static const ConditionalBehavior _feature4321(4321, false);
// Also tracker #4222 is now enabled-by-default, but you can disable
// it manually via "touch ~/MyProgram_Settings/disable_feature_4222"
static const ConditionalBehavior _feature4222(4222, true);
[...]
void DoTheExtensionsAPIStuff()
{
if (_feature4321.IsEnabled() == false) return;
[... otherwise do the extensions API stuff ...]
}
... or if you know that you are planning to make your Extensions API enabled-by-default starting with version 4500 of your program, you can set it so that Extensions API will be enabled-by-default only if GetMyProgramVersion() returns 4500 or greater:
static ConditionalBehavior _feature4321(4321, false, 4500);
[...]
... also, if you wanted to get more elaborate, the API could be extended so that IsBehaviorFlagEnabled() can optionally return a string to the caller containing the contents of the file it found (if any), so that you could do shell commands like:
echo "opengl" > ~/MyProgram_Settings/graphics_renderer
... to tell your program to use OpenGL for its 3D graphics, or etc:
// In Renderer.cpp
std::string rendererType;
if (IsDebugFlagEnabled("graphics_renderer", &rendererType))
{
printf("The user wants me to use [%s] for rendering 3D graphics!\n", rendererType.c_str());
}
else printf("The user didn't specify what renderer to use.\n");

xerces_3_1 is able to create invalid xml at comments & processing instructions

I've encountered a problem using the xerces-dom library:
When you're adding a comments to the xml-tree like:
DOMDocument* doc = impl->createDocument(0, L"root", 0);
DOMElement* root = doc->getDocumentElement();
DOMComment* com1 = doc->createComment(L"SetA -- DataA");
DOMComment* com2 = doc->createComment(L"SetB -- DataB");
doc->insertBefore(com1, root);
doc->insertBefore(com2, root);
That will create the following xml-tree:
<?xml version="1.0" encoding="UTF-8" standalone="false"?>
<!--SetA -- DataA-->
<!--SetB -- DataB-->
<root/>
which is indeed invalid xml.
The same can be done with processing instructions by using ?> as data:
DOMProcessingInstruction procInstr = doc->createProcessingInstruction(L"target", L"?>");
My question:
Is there a way i can configure xerces to not create these kind of comments or do i have to check for these things myself?
And my other question: Why isn't it possible to just always escape characters like <>&'", even in comments and processing instructions, in order to avoid these kind of problems?

A DOMDocument is not an XML document. It is supposed to represent one, but it is conceivable that a valid DOM may not be serializable into a valid XML document (the converse should be less likely). Indeed this appears to be the case here:
Neither the Level 1 or Level2 two specs say anything about this, but the Level 3 DOM specification added this sentence about the DOMComment interface:
No lexical check is done on the content of a comment and it is therefore possible to have the character sequence "--" (double-hyphen) in the content, which is illegal in a comment per section 2.5 of [XML 1.0]. The presence of this character sequence must generate a fatal error during serialization.
So Xerces is operating within the DOM Level 3 specification even if it accepts a comment with '--' in it, as long as it bombs if you go to serialize it.
Not a great situation, but it makes sense because DOM was originally intended to represent XML Documents that have been read in, not to create new ones. So it is liberal in what it can represent. Fine for reading - a DOMComment can represent anything (and more) the XML document can, but a bit annoying that it doesn't catch the invalid string when you createComment().
Checking DOMDocumentImpl.cpp we see:
DOMComment *DOMDocumentImpl::createComment(const XMLCh *data)
{
return new (this, DOMMemoryManager::COMMENT_OBJECT) DOMCommentImpl(this, data);
}
And in DOMCommentImpl.cpp we have just:
DOMCommentImpl::DOMCommentImpl(DOMDocument *ownerDoc, const XMLCh *dat)
: fNode(ownerDoc), fCharacterData(ownerDoc, dat)
{
fNode.setIsLeafNode(true);
}
Finally we see in DOMCharacterDataImpl.cpp that there is no chance of validation up front - it just saves the user provided string without checking it.
DOMCharacterDataImpl::DOMCharacterDataImpl(DOMDocument *doc, const XMLCh *dat)
{
fDoc = (DOMDocumentImpl*)doc;
XMLSize_t len=XMLString::stringLen(dat);
fDataBuf = fDoc->popBuffer(len+1);
if (!fDataBuf)
fDataBuf = new (fDoc) DOMBuffer(fDoc, len+15);
fDataBuf->set(dat, len);
}
Sadly, no Xerces does not have an option or even a nice hook to check this for you. And because the Level 3 spec seems to demand that "No lexical check is done", it probably isn't even legal to add one.
The answer to your second question is simpler to answer: Because that's the way they wanted it defined it. See the XML 1.1 spec for example:
Comments
[15] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
It is similar for PIs.
The grammar simply does not allow for escapes. Seems about right: baroque and broke.
Maybe there is a way to catch the error on serialization or normalization, but I wasn't able to confirm whether Xerces 3.1 can. To be safe I think the best way is to wrap createComment() and check for it before creating the node, or walk the tree and check it yourself.

llvm dumping control flow graph to file inside a pass

I want to build a control flow graph diagram in llvm in one of my passes. I currently use the following to show the CFG
block->getParent()->viewCFG(); //block is a basic block
The problem is that it pops up a windows. I just want to dump the cfg at that particular program point, as a dot file (or jpg if possible), not to show up in a window. How can I do the same? I am using llvm 3.1.
NOTE: I am modifying the cfg in my pass, before that program point. Hence I cannot use the opt -view-cfg.
Update:
Thanks to Mishr, I was able to draw to graph with this
WriteGraph(File, (const llvm::Function*) &fun, true, "test"); //I have also tired with false
The CFG is shown. But the nodes are blank. How can I show the contents of the node

Take a look at this, read the comment before the viewCFG() function.
http://llvm.org/docs/doxygen/html/CFGPrinter_8cpp_source.html
The viewCFG() function is intended for printing the CFG in a new window. To dump the CFG in a file you have to use CFGPrinter pass which can be invoked by the handle dot-cfg.

Let me add something to ssubbotin's answer.
The question is about
DOTGraphTraits<const Function*>
provided by CFGPrinter.
In my case I had to use the call like that:
WriteGraph<const llvm::Function*>(...)
to make it work.
The function template definition is like the following:
template<typename GraphType>
raw_ostream &WriteGraph(raw_ostream &O, const GraphType &G,
bool ShortNames = false,
const Twine &Title = "")
so GraphType gets non-const with implicit template call.

You need to add include:
#include <llvm\Analysis\CFGPrinter.h>
It has getEdgeSourceLabel implementation inside which provides correct node labels.
In the same time, make sure you don't have <llvm\Support\CFG.h> file included since it provides empty labels by default.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Xerces, xpaths, and XML namespaces - c++

Related

Get ScriptOrigin from v8::Module

list package `MoveToFront` not working for me

c++ best way to realise global switches/flags to control program behaviour without tying the classes to a common point

xerces_3_1 is able to create invalid xml at comments & processing instructions

llvm dumping control flow graph to file inside a pass

Categories

Resources