Disable debug output in libxml2 and xmlsec - c++

In my software, I use libxml2 and xmlsec to manipulate (obviously) XML data structures. I mainly use XSD schema validation and so far, it works well.
When the data structure input by the client doesn't match the XSD schema, libxml2 (or xmlsec) output some debug strings to the console.
Here is an example:
Entity: line 1: parser error : Start tag expected, '<' not found
DUMMY<?xml
^
While those strings are useful for debugging purposes, I don't want them to appear and polute the console output in the released software. So far, I couldn't find an official way of doing this.
Do you know how to suppress the debug output or (even better) to redirect it to a custom function ?
Many thanks.

I would investigate the xmlSetGenericErrorFunc() and xmlThrDefSetGenericErrorFunc() functions, they seem right. The documentation is .. sparse, though.
Here is some Python code that seems to use these functions to disable error messages, the relevant lines look like this:
# dummy function: no debug output at all
cdef void _nullGenericErrorFunc(void* ctxt, char* msg, ...) nogil:
pass
# setup for global log:
cdef void _initThreadLogging():
# disable generic error lines from libxml2
xmlerror.xmlThrDefSetGenericErrorFunc(NULL, _nullGenericErrorFunc)
xmlerror.xmlSetGenericErrorFunc(NULL, _nullGenericErrorFunc)

I was looking for the same solution for some C code:
static void __xmlGenericErrorFunc (void *ctx, const char *msg, ...) { }
xmlSetGenericErrorFunc(nil, __xmlGenericErrorFunc);
xmlThrDefSetGenericErrorFunc(nil, __xmlGenericErrorFunc);

Related

Dump apache thrift messages for debugging purposes?

Apache Thrift is one of the more popular choices for an opensource RPC frameworks (gRPC is also one that gets lot of tracktion since release into the open).
In my setup in c++ Im using a TMultiplexedProcessor. I guess this could be any TProcessor for that matter since Im just interested in printing whatever is sent.
This has a method called:
bool process(std::shared_ptr<protocol::TProtocol> in,
std::shared_ptr<protocol::TProtocol> out,
void* connectionContext) override {
My idea was to override this again, so that I could print the in argument - but how can I write TProtocol to output ? (at a glance, it does not seem straightforward to serialize into a string)
I get a feeling there maybe is some other or an easier method. So my question is how can I dump all messages recieved via thrift (for debugging purpose) ?
There's TProtocolTap and TDebugProtocol.
Usage example can be found in thrift_dump.cpp:
shared_ptr<TProtocol> iprot(new TBinaryProtocol(itrans));
shared_ptr<TProtocol> oprot(
new TDebugProtocol(
shared_ptr<TTransport>(new TBufferedTransport(
shared_ptr<TTransport>(new TFDTransport(STDOUT_FILENO))))));
TProtocolTap tap(iprot, oprot);
std::string name;
TMessageType messageType;
int32_t seqid;
for (;;) {
tap.readMessageBegin(name, messageType, seqid);
tap.skip(T_STRUCT);
tap.readMessageEnd();
}

Custom submodules in pytorch / libtorch C++

Full disclosure, I asked this same question on the PyTorch forums about a few days ago and got no reply, so this is technically a repost, but I believe it's still a good question, because I've been unable to find an answer anywhere online. Here goes:
Can you show an example of using register_module with a custom module?
The only examples I’ve found online are registering linear layers or convolutional layers as the submodules.
I tried to write my own module and register it with another module and I couldn’t get it to work.
My IDE is telling me no instance of overloaded function "MyModel::register_module" matches the argument list -- argument types are: (const char [14], TreeEmbedding)
(TreeEmbedding is the name of another struct I made which extends torch::nn::Module.)
Am I missing something? An example of this would be very helpful.
Edit: Additional context follows below.
I have a header file "model.h" which contains the following:
struct TreeEmbedding : torch::nn::Module {
TreeEmbedding();
torch::Tensor forward(Graph tree);
};
struct MyModel : torch::nn::Module{
size_t embeddingSize;
TreeEmbedding treeEmbedding;
MyModel(size_t embeddingSize=10);
torch::Tensor forward(std::vector<Graph> clauses, std::vector<Graph> contexts);
};
I also have a cpp file "model.cpp" which contains the following:
MyModel::MyModel(size_t embeddingSize) :
embeddingSize(embeddingSize)
{
treeEmbedding = register_module("treeEmbedding", TreeEmbedding{});
}
This setup still has the same error as above. The code in the documentation does work (using built-in components like linear layers), but using a custom module does not. After tracking down torch::nn::Linear, it looks as though that is a ModuleHolder (Whatever that is...)
Thanks,
Jack
I will accept a better answer if anyone can provide more details, but just in case anyone's wondering, I thought I would put up the little information I was able to find:
register_module takes in a string as its first argument and its second argument can either be a ModuleHolder (I don't know what this is...) or alternatively it can be a shared_ptr to your module. So here's my example:
treeEmbedding = register_module<TreeEmbedding>("treeEmbedding", make_shared<TreeEmbedding>());
This seemed to work for me so far.

Serializing a FlatBuffer object to JSON without it's schema file

I've been working with FlatBuffers as a solution for various things in my project, one of them specifically being JSON support. However, while FB natively supports JSON generation, the documentation for flatbuffers is poor, and the process is somewhat cumbersome. Right now, I am working in the Object->JSON direction. The issue I am having doesn't really arise the other way around (I think).
I currently have JSON generation working per an example I found here (line 630, JsonEnumsTest()) - by parsing a .fbs file into a flattbuffers::Parser, building and packaging my flatbuffer object, then running GenerateText() to generate a JSON string. The code I have is simpler than the example in test.cpp, and looks vaguely like this:
bool MyFBSchemaWrapper::asJson(std::string& jsonOutput)
{
//**This is the section I don't like having to do
std::string schemaFile;
if (flatbuffers::LoadFile((std::string(getenv("FBS_FILE_PATH")) + "MyFBSchema.fbs").c_str(), false, &schemaFile))
{
flatbuffers::Parser parser;
const char *includePaths[] = { getenv("FBS_FILE_PATH");
parser.Parse(schemaFile.c_str(), includePaths);
//**End bad section
parser.opts.strict_json = true;
flatbuffers::FlatBufferBuilder fbBuilder;
auto testItem1 = fbBuilder.CreateString("test1");
auto testItem2 = fbBuilder.CreateString("test2");
MyFBSchemaBuilder myBuilder(fbBuilder);
myBuilder.add_item1(testItem1);
myBuilder.add_item2(testItem2);
FinishMyFBSchemaBuffer(fbBuilder, myBuilder.finish());
auto result = GenerateText(parser, fbBuilder.GetBufferPointer(), &jsonOutput);
return true;
}
return false;
}
Here's my issue: I'd like to avoid having to include the .fbs files to set up my Parser. I don't want to clutter an already large monolithic program by adding even more random folders, directories, environment variables, etc. I'd like to be able to generate JSON from the compiled FlatBuffer schemas, and not have to search for a file to do so.
Is there a way for me to avoid having to read back in my .fbs schemas into the parser? My intuition is pointing to no, but the lack of documentation and community support on the topic of FlatBuffers & JSON is telling me there might be a way. I'm hoping that there's a way to use the already generated MyFBSchema_generated.h to create a JSON string.
Yes, see Mini Reflection in the documentation: http://google.github.io/flatbuffers/flatbuffers_guide_use_cpp.html

register ErrorCollector or intercept parse errors for wire format?

When It is possible to define a custom ErrorCollector class for handling google::protobuf parsing errors
struct ErrorCollector : ::google::protobuf::io::ErrorCollector
{
void AddError(int line, int column, const std::string& message) override
{
// log error
}
void AddWarning(int line, int column, const std::string& message) override
{
// log warning
}
};
When parsing from a text file, you can use the protobuf TextFormat class and register your custom ErrorCollector
::google::protobuf::io::IstreamInputStream input_stream(&file);
::google::protobuf::TextFormat::Parser parser;
ErrorCollector error_collector;
parser.RecordErrorsTo(&error_collector);
if (parser.Parse(&input_stream, &msg))
{
// handle msg
}
For parsing wire format, I currently use Message::ParseFromArray
if (msg.ParseFromArray(data, data_len))
{
// handle msg
}
This doesn't allow me to specify a custom ErrorCollector though.
I've searched through the source code, but as of yet have been unable to find if this is possible.
Is it possible to use an ErrorCollector when parsing wire format?
Is there another way to intercept parse errors and make them available to client code?
There are essentially two ways that parsing the wire format could fail:
The bytes are not a valid protobuf (e.g. they are corrupted, or in a totally different format).
A required field is missing.
For case 1, protobuf does not give you any more information than "it's invalid". This is partly for code simplicity (and speed), but it is also partly because any attempt to provide more information usually turns out more misleading than helpful. Detailed error reporting is useful for text format because text is often written by humans, but machines make very different kinds of errors. In some languages, protobuf actually reports specific errors like "end-group tag does not match start-group tag". In the vast majority of cases, this error really just means "the bytes are corrupted", but inevitably people think the error is trying to tell them something deeper which they do not understand. They then post questions to stack overflow like "How do I make sure my start-group and end-group tags match?" when they really should be comparing bytes between their source and destination to narrow down where they got corrupted. Even reporting the byte position where the parse error occurred is not very useful: protobuf is a dense encoding, which means that many random corrupt byte sequences will parse successfully, which means the parser may only notice a problem somewhere later down the line rather than at the point where things actually went wrong.
The one case that clearly is useful to distinguish is case 2 (missing required fields) -- at least, if you use required fields (I personally recommend avoiding them). There are a couple options here:
Normally, required field checks write errors to the console (on stderr). You can intercept these and record them your own way using SetLogHandler, but this doesn't give you structured information, only text messages.
To check required fields more programmatically, you can separate required field checking from parsing. Use MessageLite::ParsePartialFromArray() or one of the other Partial parsing methods to parse a message while ignoring the absence of required fields. You can then use the MessageLite::IsInitialized() to check if all required fields are set. If it returns false, use Message::FindInitializationErrors() to get a list of paths of all required fields that are missing.

Sphinx: Correct way to document an enum?

Looking through the C and C++ domains of Sphinx, it doesn't appear to have native support for documenting enums (and much less anonymous enums). As of now, I use cpp:type:: for the enum type, and then a list of all possible values and their description, but that doesn't seem like an ideal way to deal with it, especially since it makes referencing certain values a pain (either I reference just the type, or add an extra marker in front of the value).
Is there a better way to do this? And how would I go about handling anonymous enums?
A project on Github, spdylay, seems to have an approach. One of the header files at
https://github.com/tatsuhiro-t/spdylay/blob/master/lib/includes/spdylay/spdylay.h
has code like this:
/**
* #enum
* Error codes used in the Spdylay library.
*/
typedef enum {
/**
* Invalid argument passed.
*/
SPDYLAY_ERR_INVALID_ARGUMENT = -501,
/**
* Zlib error.
*/
SPDYLAY_ERR_ZLIB = -502,
} spdylay_error;
There some description of how they're doing it at https://github.com/tatsuhiro-t/spdylay/tree/master/doc, which includes using a API generator called mkapiref.py, available at
https://github.com/tatsuhiro-t/spdylay/blob/master/doc/mkapiref.py
The RST it generates for this example is
.. type:: spdylay_error
Error codes used in the Spdylay library.
.. macro:: SPDYLAY_ERR_INVALID_ARGUMENT
(``-501``)
Invalid argument passed.
.. macro:: SPDYLAY_ERR_ZLIB
(``-502``)
Zlib error.
You could take a look and see if it's useful for you.
Sphinx now has support for enums
Here is an example with enum values:
.. enum-class:: partition_affinity_domain
.. enumerator:: \
not_applicable
numa
L4_cache
L3_cache
L2_cache
L1_cache
next_partitionab
Hi Maybe you should consider using doxygen for documentation as it has a lot more native support for c / c++. if you want to retain the sphinx output of your documentation you can output from doxygen as xml, then using Breathe it will take the xml and give you the same sphinx output you are used to having.
Here is an example of documenting an enum in doxygen format from the breathe website.
//! Our toolset
/*! The various tools we can opt to use to crack this particular nut */
enum Tool
{
kHammer = 0, //!< What? It does the job
kNutCrackers, //!< Boring
kNinjaThrowingStars //!< Stealthy
};
hope this helps.