Cereal - multiple de-serialization - c++

I am very new to Cereal, and I have a (possible simple) question:
Is there a way to deserialize multiple objects when I don't know the number of objects inside the (XML) archive?
I tried something like:
std::ifstream is("c:\\data.xml");
cereal::XMLInputArchive archive(is);
while (is.good() && !is.eof())
{
try{
ObjectIn oIn;
archive(oIn);
objectList.push_back(oIn);
}
catch (exception e){
}
}
Let's say I have 3 objects in the XML file and the XML that I receive hasn't the containing object number. So, in my code, the first 3 iteration are OK, but the 4th generates
"Unhandled exception at 0x0035395E in CerealTest.exe: 0xC0000005: Access violation reading location 0x00000018."
Do you have any suggestion?

Let me ask you a question before trying to answer your question: if you are serializing an unknown number of items, why not place those items in some container designed to hold a variable number of items? You could use an std::vector to store your ObjectIn and easily handle any number of them. Your code would look something like:
std::vector<MyObjects> vec;
{
cereal::XMLInputArchive ar("filename");
ar( vec );
} // get in the habit of using cereal archives in an RAII fashion
The above works with any number of objects serialized, assuming that cereal generated the XML to begin with. You can even add or remove elements from the vector in the XML code and it will work properly.
If you are insistent on reading some unknown number of objects and not placing them in a container designed to hold a variable number of elements, you can it something like this (but be warned this is not a good idea - you should really try to change your serialization strategy and not do this):
{
cereal::XMLInputArchive ar("filename");
try
{
while( true )
{
ObjectIn ob;
ar( ob );
objectList.push_back(oIn);
}
catch( ... )
{ }
}
Again let me stress that this is fundamentally a problem with your serialization strategy and you should be serializing a container instead of items a-la-carte if you don't know how many there will be. The above code can't handle reading in anything else, it just tries to blindly read things in until it encounters an exception. If your objects followed some naming pattern, you could use name-value-pairs (cereal::make_nvp) to retrieve them by name.

Related

Include pre-encoded protocol buffer message within outer message

Is there a way to create a protocol buffer message in C++ that contains a pre-encoded inner message, without parsing and then re-serializing the inner message?
To clarify, consider the following message definitions:
message Inner {
required int i = 1;
// ... more fields ...
}
message Outer {
repeated Inner inners = 1;
// ... more fields ...
}
Suppose you have a collection of 10 byte arrays, each of which contains an encoded version of an Inner. You'd like to create an Outer that contains the 10 Inners. You don't want to hand-encode because Outer has other fields and may itself be included in other messages. Is there a way to get protocol buffers to directly copy the pre-encoded Inner?
There is no a clean way, but there are a few hacky ways. One is to define a second message like this:
message RawOuter {
repeated bytes inners = 1;
// ... same fields as Outer ...
}
RawOuter is identical to Outer except that the inners repeated field has been changed from type Inner to type bytes. If you populate inners with the encoded instances of Inner, then serialize the RawOuter, you get exactly the same result as if you had built an Outer with the parsed verisons. That is to say, the wire format for a nested message is identical to the wire format for a bytes field containing the serialization of that nested message. This is one of those funny exploitable quirks of the protobuf encoding.
This hack has some problems, though. In particular, it doesn't work well if you're trying to build an Outer instance that is embedded in some other proto, since you probably don't want to maintain two copies of every containing message, one using Outer and one using RawOuter.
Another, even hackier option is to inject the encoded messages into the Outer instance's UnknownFieldSet.
Outer outer;
for (auto& inner: inners) {
outer.mutable_unknown_fields()
->AddLengthDelimited(1, inner);
}
The UnknownFieldSet is intended to store fields seen while parsing that do not match any known field number defined in the .proto file. The idea is that this allows you to write a proxy server that simply receives messages and forwards them to another server without having to re-compile the proxy every time you add a new field to the protocol. Here, we're abusing it by sticking a value into it that actually corresponds to a known field, but the implementation will not notice, and so it will write out these fields just fine.
The main problem with this approach is that if anyone else inspects your Outer instance in the meantime, it will appear to them as if the inners list is empty, since the values are actually hidden somewhere else. This is a pretty ugly hack that will probably come back to haunt you later. I would only recommend it if you have measured the performance difference and found it to be large.
Also note that the serialization code always writes unknown fields last, whereas known fields are written in order by field number. Parsers are supposed to accept any order, but occasionally you'll find someone who is using the unparsed data as a hash map key or something and that totally breaks if the fields are re-ordered.
By the way, you can improve performance of both of these approaches by swapping the strings into place rather than copying, i.e.
raw_outer->add_inners()->swap(inner);
or
outer->mutable_unknown_fields()->AddLengthDelimited(1)->swap(inner);

Xerces: How to check the validity of an XML file using ErrorHandler

I am trying to determine if a given XML file is valid (has proper syntax and structure), and I am using Xerces. I have been able to succesfully read proper files but when I give it files with incorrect syntax, no errors are thrown.
I have been fishing around and found out that I might have to use an Error handler and user setErrorHandler to catch the errors instead of the traditional try-throw-catch exception handling.
The problem that I am having though is that I am very confused how to declare the proper handler, set it to my parser and then read the errors if there are any that show up.
Is there any chance somebody could shed some light on my situation?
// #input_parameter from function: const string & xmlConfigArg
xercesc::DOMDocument* doc = NULL;
string xmlConfig(xmlConfigArg);
Handler handler; // I'm not sure what type of handler to use
_parser->setErrorHandler(&handler);
try{
_parser->parse(xmlConfigArg.c_str());
doc = _parser-> getDocument();
}catch(...){
//Nothing is ever caught here
}
You need to derive a class from ErrorHandler (< xercesc/sax/ErrorHandler.hpp >)
then overwrite all the virtual methods there.
After doing so, You can get the error code from the class you created. No exceptions will be thrown in the parsing, so you can wave the try/cache block (or keep it for a different use).

COM Error 0x80004003 (Invalid Pointer) access MS Outlook contacts

I am some ATL code that uses smart COM pointers to iterate through MS Outlook contacts, and on some PC's I am getting a COM error 0x80004003 ('Invalid Pointer') for each contact. The same code works fine on other PCs. The code looks like this:
_ApplicationPtr ptr;
ptr.CreateInstance(CLSID_Application);
_NameSpacePtr ns = ptr->GetNamespace(_T("MAPI"));
MAPIFolderPtr folder = ns->GetDefaultFolder(olFolderContacts);
_ItemsPtr items = folder->Items;
const long count = items->GetCount();
for (long i = 1; i <= count; i++)
{
try
{
_ContactItemPtr contactitem = items->Item(i);
// The following line throws a 0x80004003 exception on some machines
ATLTRACE(_T("\tContact name: %s\n"), static_cast<LPCTSTR>(contactitem->FullName));
}
catch (const _com_error& e)
{
ATLTRACE(_T("%s\n"), e.ErrorMessage());
}
}
I wonder if any other applications/add-ins could be causing this? Any help would be welcome.
FullName is a property and you do the GET operation (it's probably something like this in IDL: get_FullName([out,retval] BSTR *o_sResult)). Such operation works ok with null values.
My assumption is that contactItem smart pointer points to any valid COM object. In such case the formatting operation done by ATLTRACE can cause the problem. Internally it behaves probably like standard sprintf("",args...) function.
To avoid such problems just do something like below:
ATLTRACE(_T("\tContact name: %s\n"),
_bstr_t(contactitem->FullName)?static_cast<LPCTSTR>(contactitem->FullName):"(Empty)")
Just a guess:
Maybe the "FullName" field in the address book is empty and that's why the pointer is invalid?
hard to tell, because your code doesn't indicate which COM-interfaces you're using.
Does this make any difference?
ATLTRACE(_T("\tContact name: %s\n"), static_cast<LPCTSTR>(contactitem->GetFullName()));
In my example you format NULL value to a proper text value.
If the question is about the difference between FullName(as a property) and GetFullName() (as a method) then the answer is no. Property and method should give the same result. Sometimes property can be mapped to different methods then setXXX and getXXX. It can be achieved by using some specific syntax in IDL (and in reality in TLB after compilation of IDL to TLB). If property FullName is not mapped to method GetFullName then you will achieve different result.
So please examine file *.tlh after importing some type library to your project...

Exception handling aware of execution flow

Edit:
For personn interested in a cleaner way to implemenent that, have a look to that answer.
In my job I often need to use third-made API to access remote system.
For instance to create a request and send it to the remote system:
#include "external_lib.h"
void SendRequest(UserRequest user_request)
{
try
{
external_lib::Request my_request;
my_request.SetPrice(user_request.price);
my_request.SetVolume(user_request.quantity);
my_request.SetVisibleVolume(user_request.quantity);
my_request.SetReference(user_request.instrument);
my_request.SetUserID(user_request.user_name);
my_request.SetUserPassword(user_request.user_name);
// Meny other member affectations ...
}
catch(external_lib::out_of_range_error& e)
{
// Price , volume ????
}
catch(external_lib::error_t& e)
{
// Here I need to tell the user what was going wrong
}
}
Each lib's setter do checks the values that the end user has provided, and may thow an exception when the user does not comply with remote system needs. For instance a specific user may be disallowed to send a too big volume. That's an example, and actually many times users tries does not comply: no long valid instrument, the prices is out of the limit, etc, etc.
Conseqently, our end user need an explicit error message to tell him what to modify in its request to get a second chance to compose a valid request. I have to provide hiim such hints
Whatever , external lib's exceptions (mostly) never specifies which field is the source
of aborting the request.
What is the best way, according to you, to handle those exceptions?
My first try at handling those exceptions was to "wrap" the Request class with mine. Each setters are then wrapped in a method which does only one thing : a try/catch block. The catch block then throws a new exceptions of mine : my_out_of_range_volume, or my_out_of_range_price depending on the setter. For instance SetVolume() will be wrapped this way:
My_Request::SetVolume(const int volume)
{
try
{
m_Request.SetVolume(volume);
}
catch(external_lib::out_range_error& e)
{
throw my_out_of_range_volume(volume, e);
}
}
What do you think of it? What do you think about the exception handling overhead it implies? ... :/
Well the question is open, I need new idea to get rid of that lib constraints!
If there really are a lot of methods you need to call, you could cut down on the code using a reflection library, by creating just one method to do the calling and exception handling, and passing in the name of the method/property to call/set as an argument. You'd still have the same amount of try/catch calls, but the code would be simpler and you'd already know the name of the method that failed.
Alternatively, depending on the type of exception object that they throw back, it may contain stack information or you could use another library to walk the stack trace to get the name of the last method that it failed on. This depends on the platform you're using.
I always prefer a wrapper whenever I'm using third party library.
It allows me to define my own exception handling mechanism avoiding users of my class to know about external library.
Also, if later the third party changes the exception handling to return codes then my users need not be affected.
But rather than throwing the exception back to my users I would implement the error codes. Something like this:
class MyRequest
{
enum RequestErrorCode
{
PRICE_OUT_OF_LIMIT,
VOLUME_OUT_OF_LIMIT,
...
...
...
};
bool SetPrice(const int price , RequestErrorCode& ErrorCode_out);
...
private:
external_lib::Request mRequest;
};
bool MyRequest::SetPrice(const int price , RequestErrorCode& ErrorCode_out)
{
bool bReturn = true;
try
{
bReturn = mRequest.SetPrice(price);
}
catch(external_lib::out_of_range_error& e)
{
ErrorCode_out = PRICE_OUT_OF_LIMIT;
bReturn = false;
}
return bReturn;
}
bool SendRequest(UserRequest user_request)
{
MyRequest my_request;
MyRequest::RequestErrorCode anErrorCode;
bool bReturn = my_request.SetPrice(user_request.price, anErrorCode);
if( false == bReturn)
{
//Get the error code and process
//ex:PRICE_OUT_OF_LIMIT
}
}
I think in this case I might dare a macro. Something like (not tested, backslashes omitted):
#define SET( ins, setfun, value, msg )
try {
ins.setfun( value );
}
catch( external::error & ) {
throw my_explanation( msg, value );
}
and in use:
Instrument i;
SET( i, SetExpiry, "01-01-2010", "Invalid expiry date" );
SET( i, SetPeriod, 6, "Period out of range" );
You get the idea.
Although this is not really the answer you are looking for, but i think that your external lib, or you usage of it, somehow abuses exceptions. An exception should not be used to alter the general process flow. If it is the general case, that the input does not match the specification, than it is up to your app to valid the parameter before passing it to the external lib. Exceptions should only be thrown if an "exceptional" case occurrs, and i think whenever it comes to doing something with user input, you usually have to deal with everything and not rely on 'the user has to provide the correct data, otherwise we handle it with exceptions'.
nevertheless, an alternative to Neil's suggestions could be using boost::lambda, if you want to avoid macros.
In your first version, you could report the number of operations that succeeded provided the SetXXX functions return some value. You could also keep a counter (which increases after every SetXXX call in that try block) to note what all calls succeeded and based on that counter value, return an appropriate error message.
The major problem with validating each and every step is, in a real-time system -- you are probably introducing too much latency.
Otherwise, your second option looks like the only way. Now, if you have to write a wrapper for every library function and why not add the validation logic, if you can, instead of making the actual call to the said library? This IMO, is more efficient.

Unit Tests for comparing text files in NUnit

I have a class that processes a 2 xml files and produces a text file.
I would like to write a bunch of unit / integration tests that can individually pass or fail for this class that do the following:
For input A and B, generate the output.
Compare the contents of the generated file to the contents expected output
When the actual contents differ from the expected contents, fail and display some useful information about the differences.
Below is the prototype for the class along with my first stab at unit tests.
Is there a pattern I should be using for this sort of testing, or do people tend to write zillions of TestX() functions?
Is there a better way to coax text-file differences from NUnit? Should I embed a textfile diff algorithm?
class ReportGenerator
{
string Generate(string inputPathA, string inputPathB)
{
//do stuff
}
}
[TextFixture]
public class ReportGeneratorTests
{
static Diff(string pathToExpectedResult, string pathToActualResult)
{
using (StreamReader rs1 = File.OpenText(pathToExpectedResult))
{
using (StreamReader rs2 = File.OpenText(pathToActualResult))
{
string actualContents = rs2.ReadToEnd();
string expectedContents = rs1.ReadToEnd();
//this works, but the output could be a LOT more useful.
Assert.AreEqual(expectedContents, actualContents);
}
}
}
static TestGenerate(string pathToInputA, string pathToInputB, string pathToExpectedResult)
{
ReportGenerator obj = new ReportGenerator();
string pathToResult = obj.Generate(pathToInputA, pathToInputB);
Diff(pathToExpectedResult, pathToResult);
}
[Test]
public void TestX()
{
TestGenerate("x1.xml", "x2.xml", "x-expected.txt");
}
[Test]
public void TestY()
{
TestGenerate("y1.xml", "y2.xml", "y-expected.txt");
}
//etc...
}
Update
I'm not interested in testing the diff functionality. I just want to use it to produce more readable failures.
As for the multiple tests with different data, use the NUnit RowTest extension:
using NUnit.Framework.Extensions;
[RowTest]
[Row("x1.xml", "x2.xml", "x-expected.xml")]
[Row("y1.xml", "y2.xml", "y-expected.xml")]
public void TestGenerate(string pathToInputA, string pathToInputB, string pathToExpectedResult)
{
ReportGenerator obj = new ReportGenerator();
string pathToResult = obj.Generate(pathToInputA, pathToInputB);
Diff(pathToExpectedResult, pathToResult);
}
You are probably asking for the testing against "gold" data. I don't know if there is specific term for this kind of testing accepted world-wide, but this is how we do it.
Create base fixture class. It basically has "void DoTest(string fileName)", which will read specific file into memory, execute abstract transformation method "string Transform(string text)", then read fileName.gold from the same place and compare transformed text with what was expected. If content is different, it throws exception. Exception thrown contains line number of the first difference as well as text of expected and actual line. As text is stable, this is usually enough information to spot the problem right away. Be sure to mark lines with "Expected:" and "Actual:", or you will be guessing forever which is which when looking at test results.
Then, you will have specific test fixtures, where you implement Transform method which does right job, and then have tests which look like this:
[Test] public void TestX() { DoTest("X"); }
[Test] public void TestY() { DoTest("Y"); }
Name of the failed test will instantly tell you what is broken. Of course, you can use row testing to group similar tests. Having separate tests also helps in a number of situations like ignoring tests, communicating tests to colleagues and so on. It is not a big deal to create a snippet which will create test for you in a second, you will spend much more time preparing data.
Then you will also need some test data and a way your base fixture will find it, be sure to set up rules about it for the project. If test fails, dump actual output to the file near the gold, and erase it if test pass. This way you can use diff tool when needed. When there is no gold data found, test fails with appropriate message, but actual output is written anyway, so you can check that it is correct and copy it to become "gold".
I would probably write a single unit test that contains a loop. Inside the loop, I'd read 2 xml files and a diff file, and then diff the xml files (without writing it to disk) and compare it to the diff file read from disk. The files would be numbered, e.g. a1.xml, b1.xml, diff1.txt ; a2.xml, b2.xml, diff2.txt ; a3.xml, b3.xml, diff3.txt, etc., and the loop stops when it doesn't find the next number.
Then, you can write new tests just by adding new text files.
Rather than call .AreEqual you could parse the two input streams yourself, keep a count of line and column and compare the contents. As soon as you find a difference, you can generate a message like...
Line 32 Column 12 - Found 'x' when 'y' was expected
You could optionally enhance that by displaying multiple lines of output
Difference at Line 32 Column 12, first difference shown
A = this is a txst
B = this is a tests
Note, as a rule, I'd generally only generate through my code one of the two streams you have. The other I'd grab from a test/text file, having verified by eye or other method that the data contained is correct!
I would probably use XmlReader to iterate through the files and compare them. When I hit a difference I would display an XPath to the location where the files are different.
PS: But in reality it was always enough for me to just do a simple read of the whole file to a string and compare the two strings. For the reporting it is enough to see that the test failed. Then when I do the debugging I usually diff the files using Araxis Merge to see where exactly I have issues.