Xerces: How to check the validity of an XML file using ErrorHandler - c++

I am trying to determine if a given XML file is valid (has proper syntax and structure), and I am using Xerces. I have been able to succesfully read proper files but when I give it files with incorrect syntax, no errors are thrown.
I have been fishing around and found out that I might have to use an Error handler and user setErrorHandler to catch the errors instead of the traditional try-throw-catch exception handling.
The problem that I am having though is that I am very confused how to declare the proper handler, set it to my parser and then read the errors if there are any that show up.
Is there any chance somebody could shed some light on my situation?
// #input_parameter from function: const string & xmlConfigArg
xercesc::DOMDocument* doc = NULL;
string xmlConfig(xmlConfigArg);
Handler handler; // I'm not sure what type of handler to use
_parser->setErrorHandler(&handler);
try{
_parser->parse(xmlConfigArg.c_str());
doc = _parser-> getDocument();
}catch(...){
//Nothing is ever caught here
}

You need to derive a class from ErrorHandler (< xercesc/sax/ErrorHandler.hpp >)
then overwrite all the virtual methods there.
After doing so, You can get the error code from the class you created. No exceptions will be thrown in the parsing, so you can wave the try/cache block (or keep it for a different use).

Related

register ErrorCollector or intercept parse errors for wire format?

When It is possible to define a custom ErrorCollector class for handling google::protobuf parsing errors
struct ErrorCollector : ::google::protobuf::io::ErrorCollector
{
void AddError(int line, int column, const std::string& message) override
{
// log error
}
void AddWarning(int line, int column, const std::string& message) override
{
// log warning
}
};
When parsing from a text file, you can use the protobuf TextFormat class and register your custom ErrorCollector
::google::protobuf::io::IstreamInputStream input_stream(&file);
::google::protobuf::TextFormat::Parser parser;
ErrorCollector error_collector;
parser.RecordErrorsTo(&error_collector);
if (parser.Parse(&input_stream, &msg))
{
// handle msg
}
For parsing wire format, I currently use Message::ParseFromArray
if (msg.ParseFromArray(data, data_len))
{
// handle msg
}
This doesn't allow me to specify a custom ErrorCollector though.
I've searched through the source code, but as of yet have been unable to find if this is possible.
Is it possible to use an ErrorCollector when parsing wire format?
Is there another way to intercept parse errors and make them available to client code?
There are essentially two ways that parsing the wire format could fail:
The bytes are not a valid protobuf (e.g. they are corrupted, or in a totally different format).
A required field is missing.
For case 1, protobuf does not give you any more information than "it's invalid". This is partly for code simplicity (and speed), but it is also partly because any attempt to provide more information usually turns out more misleading than helpful. Detailed error reporting is useful for text format because text is often written by humans, but machines make very different kinds of errors. In some languages, protobuf actually reports specific errors like "end-group tag does not match start-group tag". In the vast majority of cases, this error really just means "the bytes are corrupted", but inevitably people think the error is trying to tell them something deeper which they do not understand. They then post questions to stack overflow like "How do I make sure my start-group and end-group tags match?" when they really should be comparing bytes between their source and destination to narrow down where they got corrupted. Even reporting the byte position where the parse error occurred is not very useful: protobuf is a dense encoding, which means that many random corrupt byte sequences will parse successfully, which means the parser may only notice a problem somewhere later down the line rather than at the point where things actually went wrong.
The one case that clearly is useful to distinguish is case 2 (missing required fields) -- at least, if you use required fields (I personally recommend avoiding them). There are a couple options here:
Normally, required field checks write errors to the console (on stderr). You can intercept these and record them your own way using SetLogHandler, but this doesn't give you structured information, only text messages.
To check required fields more programmatically, you can separate required field checking from parsing. Use MessageLite::ParsePartialFromArray() or one of the other Partial parsing methods to parse a message while ignoring the absence of required fields. You can then use the MessageLite::IsInitialized() to check if all required fields are set. If it returns false, use Message::FindInitializationErrors() to get a list of paths of all required fields that are missing.

libxml2 with default sax handler and custom error handler

I would like to use a simple libxml2 parser in a C++ program the following way:
default sax handler is fine (actually I'd like to avoid the effort of writing my own. I understand that I can do what I want with a custom sax handler)
the parser should be embedded in a C++ class that can be instantiated arbitrarily (possibly multi-threaded), the libxml2 parser context as member var
there are other components also using libxml2 but out of my control (I cannot
exactly tell what they do and how they use libxml2)
in the C++ class I want to use a custom error handler that does not just prints to stderr but I want to collect the errors and throw an exception
Example:
class XmlParser
{
public:
XmlDoc * parseText(const char * txt, ...);
private:
xmlParserCtxtPtr ctx;
static void xmlErrorHandler(void * userData, xmlErrorPtr err);
}
Here is what does NOT work (to my testing and understanding):
use xmlSetStructuredErrorFunc() or xmlSetGenericErrorFunc() and set the current C++ instance as user data because these funcs just set a global var (not thread-safe)
use xmlNewParserCtxt() and set ctx->sax->serror to a regular C++ method - error handler must be static
same as previous but with a static class method - actually that does work but at the same time I want to set ctx->user_data (to 'this' of the current C++ instance) - that makes the parser crash, it looks as if inside of libxml2 ctx->user_data is passed through the functions where there should be just ctx ... however that happens consistently, i.e. looks rather like a feature than a bug :-)
Now, has anybody an idea how to get this to work?
Many thx!!!

Regex for JavaScript source

XPages application fails with following stack trace:
com.ibm.jscript.InterpretException: Script interpreter error, line=30, col=43: 'component' is null
at com.ibm.jscript.ASTTree.ASTMember.interpret(ASTMember.java:153)
at com.ibm.jscript.ASTTree.ASTCall.interpret(ASTCall.java:88)
at com.ibm.jscript.ASTTree.ASTBlock.interpret(ASTBlock.java:100)
at com.ibm.jscript.ASTTree.ASTIf.interpret(ASTIf.java:85)
at com.ibm.jscript.ASTTree.ASTBlock.interpret(ASTBlock.java:100)
at com.ibm.jscript.ASTTree.ASTIf.interpret(ASTIf.java:85)
at com.ibm.jscript.ASTTree.ASTBlock.interpret(ASTBlock.java:100)
at com.ibm.jscript.ASTTree.ASTTry.interpret(ASTTry.java:109)
at com.ibm.jscript.ASTTree.ASTIf.interpret(ASTIf.java:85)
at com.ibm.jscript.ASTTree.ASTProgram.interpret(ASTProgram.java:119)
at com.ibm.jscript.ASTTree.ASTProgram.interpretEx(ASTProgram.java:139)
From this I know, that there is problem with variable "component" nested inside hierarchy of blocks:
if -> try -> { -> if -> { -> if -> { -> method call with invalid argument.
I don't know what to look for exactly, search for "component" yields too many results.
What regex should I use to find the right spot based on code hierarchy?
In this case I see a good chance that you have not put all your SSJS code into try/catch blocks. The bad news: searching the cause of this error is extremely cumbersome as close to all SSJS blocks may be the root cause of this error.
For that reason I placed my own rule (and ignore it every now and then) to put EVERY SSJS block into a try/catch like this:
try {
// ... do fancy stuff here
} catch (e) {
print(e.toString());
}
The toString() call is used for some special cases where the error object appears to no automatically convert into an object that can be handled by the print method.
If it is the case, you have not put all SSJS blocks into a try/catch, this is exactly the right time to do so and keep that coding pattern for the future. It really helps every now and then ;-)
Instead of printStackTrace and toString() , you could just say print(e), which will output only the error message(should be the same as e.message). The error object if passed to a java routine, you could get the error line.
variable "component" nested inside hierarchy of blocks ==> We have made this working without issues.

C++ Error Reporting Interface

I'm designing an interface that can be used to report errors in C++. (I'm working with a legacy system where exceptions are out of question.) In my youthful naivety, I started along these lines while designing my API:
bool DoStuff(int amount, string* error);
Return value signals success/failure, while error is used to report a human readable explanation. So far so good. Subroutine calls passed along the error pointer and everything was hunky-dory.
I ran into the following problems with this design (so far):
Cannot report warnings.
Not thread-safe.
Next, I decided to go with the following interface, instead of plain string:
class Issues {
public:
void Error(const string& message);
void Warning(const string& message);
void Merge(const Issues& issues);
}
So that I can change my API like this:
bool DoStuff(int amount, Issues* issues);
I'm wondering, is there a more generic/standard API out there that deals with this problem? If yes, I'd like to take a look.
UPDATE: I'm not looking for a logging library. For those who are curious, imagine you're writing a query engine that includes a compiler. The compiler issues warnings and errors, and those need to be returned to the user, as part of the response. Logging has its place in the design, but this is not it.
I usually use things like boost::signals or .NET delegates to report errors/warning/logging/whatever. You report errors with no changes to the interface, and the library user plugs whatever she wants to the signal to get the error reports (writing to a file, updating a console window, aborting the program, throwing an exception, ignoring warnings, etc).
Something like this, at eg. global scope:
boost::signal<void(std::string const&)> logError;
boost::signal<void(std::string const&)> logWarning;
and then
void routineWhichMayFail()
{
...
if (answer != 42)
{
logError("Universal error");
return;
}
}
and you connect something to logError and logWarning at initialization:
void robustErrorHandler(std::string const& msg)
{
std::cerr << "Error: " << msg << "\n";
std::exit(EXIT_FAILURE);
}
void initializeMyProgram()
{
logError.connect(&robustErrorHandler);
}
You can even throw exceptions in the error handler instead of exiting, and use fancier things than bare functions (logging classes, "delegates" -- pointers to methods with a this object bundled, RPC to a distant server). This way, you decouple the error handling from error reporting, which is good. You can also report to multiple destinations, you can even have your handlers return a boolean telling whether the action should be eg. retried.
From your explanation it sounds like you are trying to implement a logging library for your project. You can look at log4cpp or Boost.Log.

Exception handling aware of execution flow

Edit:
For personn interested in a cleaner way to implemenent that, have a look to that answer.
In my job I often need to use third-made API to access remote system.
For instance to create a request and send it to the remote system:
#include "external_lib.h"
void SendRequest(UserRequest user_request)
{
try
{
external_lib::Request my_request;
my_request.SetPrice(user_request.price);
my_request.SetVolume(user_request.quantity);
my_request.SetVisibleVolume(user_request.quantity);
my_request.SetReference(user_request.instrument);
my_request.SetUserID(user_request.user_name);
my_request.SetUserPassword(user_request.user_name);
// Meny other member affectations ...
}
catch(external_lib::out_of_range_error& e)
{
// Price , volume ????
}
catch(external_lib::error_t& e)
{
// Here I need to tell the user what was going wrong
}
}
Each lib's setter do checks the values that the end user has provided, and may thow an exception when the user does not comply with remote system needs. For instance a specific user may be disallowed to send a too big volume. That's an example, and actually many times users tries does not comply: no long valid instrument, the prices is out of the limit, etc, etc.
Conseqently, our end user need an explicit error message to tell him what to modify in its request to get a second chance to compose a valid request. I have to provide hiim such hints
Whatever , external lib's exceptions (mostly) never specifies which field is the source
of aborting the request.
What is the best way, according to you, to handle those exceptions?
My first try at handling those exceptions was to "wrap" the Request class with mine. Each setters are then wrapped in a method which does only one thing : a try/catch block. The catch block then throws a new exceptions of mine : my_out_of_range_volume, or my_out_of_range_price depending on the setter. For instance SetVolume() will be wrapped this way:
My_Request::SetVolume(const int volume)
{
try
{
m_Request.SetVolume(volume);
}
catch(external_lib::out_range_error& e)
{
throw my_out_of_range_volume(volume, e);
}
}
What do you think of it? What do you think about the exception handling overhead it implies? ... :/
Well the question is open, I need new idea to get rid of that lib constraints!
If there really are a lot of methods you need to call, you could cut down on the code using a reflection library, by creating just one method to do the calling and exception handling, and passing in the name of the method/property to call/set as an argument. You'd still have the same amount of try/catch calls, but the code would be simpler and you'd already know the name of the method that failed.
Alternatively, depending on the type of exception object that they throw back, it may contain stack information or you could use another library to walk the stack trace to get the name of the last method that it failed on. This depends on the platform you're using.
I always prefer a wrapper whenever I'm using third party library.
It allows me to define my own exception handling mechanism avoiding users of my class to know about external library.
Also, if later the third party changes the exception handling to return codes then my users need not be affected.
But rather than throwing the exception back to my users I would implement the error codes. Something like this:
class MyRequest
{
enum RequestErrorCode
{
PRICE_OUT_OF_LIMIT,
VOLUME_OUT_OF_LIMIT,
...
...
...
};
bool SetPrice(const int price , RequestErrorCode& ErrorCode_out);
...
private:
external_lib::Request mRequest;
};
bool MyRequest::SetPrice(const int price , RequestErrorCode& ErrorCode_out)
{
bool bReturn = true;
try
{
bReturn = mRequest.SetPrice(price);
}
catch(external_lib::out_of_range_error& e)
{
ErrorCode_out = PRICE_OUT_OF_LIMIT;
bReturn = false;
}
return bReturn;
}
bool SendRequest(UserRequest user_request)
{
MyRequest my_request;
MyRequest::RequestErrorCode anErrorCode;
bool bReturn = my_request.SetPrice(user_request.price, anErrorCode);
if( false == bReturn)
{
//Get the error code and process
//ex:PRICE_OUT_OF_LIMIT
}
}
I think in this case I might dare a macro. Something like (not tested, backslashes omitted):
#define SET( ins, setfun, value, msg )
try {
ins.setfun( value );
}
catch( external::error & ) {
throw my_explanation( msg, value );
}
and in use:
Instrument i;
SET( i, SetExpiry, "01-01-2010", "Invalid expiry date" );
SET( i, SetPeriod, 6, "Period out of range" );
You get the idea.
Although this is not really the answer you are looking for, but i think that your external lib, or you usage of it, somehow abuses exceptions. An exception should not be used to alter the general process flow. If it is the general case, that the input does not match the specification, than it is up to your app to valid the parameter before passing it to the external lib. Exceptions should only be thrown if an "exceptional" case occurrs, and i think whenever it comes to doing something with user input, you usually have to deal with everything and not rely on 'the user has to provide the correct data, otherwise we handle it with exceptions'.
nevertheless, an alternative to Neil's suggestions could be using boost::lambda, if you want to avoid macros.
In your first version, you could report the number of operations that succeeded provided the SetXXX functions return some value. You could also keep a counter (which increases after every SetXXX call in that try block) to note what all calls succeeded and based on that counter value, return an appropriate error message.
The major problem with validating each and every step is, in a real-time system -- you are probably introducing too much latency.
Otherwise, your second option looks like the only way. Now, if you have to write a wrapper for every library function and why not add the validation logic, if you can, instead of making the actual call to the said library? This IMO, is more efficient.