Mongo Driver Validation - c++

Is there a standard set of integration tests for Mongo drivers (Connection Library)?
I have a Mongo library in C++ that I want to validate (and if possible performance test). Is there a standard set of tests (preferable with data) that I can use to validate that the library is sending and receiving the correct data to Mongo?
Is it a library or driver you are talking about?
"Driver Library"?
It provides the type to link to Mongo Server.
The types then provides methods to send/receive data etc.
And a few higher level concepts that I am experimenting with:
Handling all the on-wire protocol so that you don't need to.
You link against a library (So headers/library).
If you are developing a new C++ driver I hope you have some good reasons to not use the officially provided and maintained driver from MongoDB.
No; I don't have a good reason (other than I want to try).
But I do have reasons:
Experimenting with C++17 features that I have not used much.
I have a library that automatically serializes C++ objects to BSON with no extra code.
This seems like a perfect test project to validate it against.
I have a library that allows non-blocking use of streams that (with co-routines) allows me to write/read from multiple connections efficiently with a single thread.
This seems like a perfect test project to validate it against.
the officially provided and maintained driver from MongoDB
Sure the official one is going to have a lot more support and probably be much higher quality than my one-man band version. But you make that statement as if people should only use the official version of a library.
I disagree with that premise entirely as it locks us into the same way of thinking with no ability to have radical shifts in how we interact with the data. I want to experiment and see if I can have a more efficient interaction with the data and maybe (just maybe this will lead to the official version saying hey that's a nice idea let us implement that in the official drivers).
Competition is a way to spur new ideas (though we are probably a long way from competing).
But to get to that point I need a way to show that my driver behaves at least to a certain standard. Which means that it would be nice to have some way of validating. Here are a thousand JSON objects stream those to mongo and validate you get "XX" behavior or result.

The MongoDB C++ driver is developed open source and so are the tests to be found on GitHub:
https://github.com/mongodb/mongo-cxx-driver/tree/master/src/mongocxx/test
Your question could need some clarification:
Is it a library or driver you are talking about? If you are using the official MongoDB C++ Driver, why would you think it would send wrong results? Do you think your Library on top of the driver changes your values? Shouldn't be a test of the library independent of the driver be enough?
If you are developing a new C++ driver I hope you have some good reasons to not use the officially provided and maintained driver from MongoDB.

Related

How to write c++ spout/bolt on Storm and Thrift usage in Storm

From here:
Storm was designed from the very beginning to be compatible with multiple languages. Nimbus is a Thrift service and topologies are defined as Thrift structures. The usage of Thrift allows Storm to be used from any language.
I see that a topology created in java get deployed by serializing the topology (spouts, bolts , ComponentCommon) as a Thrift datatypes and then gets deployed on Nimbus. In Java it is easy to serialize the object with its methods and data. So on the other side Nimbus just needs to create objects and invoke them. (i might be missing detail here but I hope I got the point correctly)
But I wonder how to write the topology in C++ and deploy it the same way. Does thrift help to serialize the c++ based topology and Nimbus deploys/executes the topology in the same way as for Java?
I have seen the links link1 link2 in this regard and the only solution seems to be using a Shelbolt. which invokes the process and communicates with it over standard i/o.
In order to use the Thrift way, do we need to rewrite the storm core also in C++? Also why use Thrift when it supports only JVM languages? Thrift doesn't seem to be used at all for languages like python/c++.
I'm not sure if I'm understand your question correctly -- in my understanding you're asking Is it possible [without the Shebolt hack] to use Storm [with Thrift as comm protocol] with C++-written bolts and with C++ as the language that creates the topology.
Because of the lack of other answers to this question and based on my own research I assume there is no finished, usable implementation for your problem.
Therefore if you really have to use Storm (its common usecase is the JVM, so even if it could theoretically work with any language, it doesn't mean there is an ecosystem for other languages) and C++, you have no option but to use the Shebolt hack or modify Thrift yourself.
As you know, thrift itself has also been ported to C++. Therefore it is possible to re-build the API calls in C++. Basically, you'd need to port the Java TopologyBuilder. On the C++ side, you could start with the Thrift C++ tutorial.
This is also some kind of a hack, as you basically just rebuild half of the stack (in this case ontop of Thrift), but in general you have very few other options with a system design like Storm.
For example, the MySQL binary protocol has been rebuilt from-scr
Unless anyone has done the work for you (which I would have completely missed in my research) I see no option than to do it yourself (maybe even storm is not the best tool for your usecase!?)
If another hack (which might be even more complex and maybe even slower) besides ShellBolt is good enough for you, you could try starting a JVM from inside C++, e.g. see this SO post. I would not recommend this.
If you need an alternative distributed task queue, I have had good experience with Celery in Python environments, however I have no experience in using it in C++ directly (I usually control Python with ZeroMQ, or write my own ZeroMQ-based queues where necessary, but this is no universal solution).

c/c++ XML library question

I know that a lot of c/c++ XML library questions have been asked already (I tried to read through all of them before getting to this).
Here are the things I'm going to need in my own project:
Excellent performance
SAX2
Validation
Open source
Cross platform
I was going to use Xerces-C, but I see that a simple SAX2 setup with nothing going on in the filter is taking 5 seconds to run. (Perhaps I'm doing something wrong here?)
I would like to use libxml++, but as I tried to get it set up on my MacBook, there were some crazy dependencies that took me all the way back to gtk-doc, at which point I sort of tabled the idea.
So now I'm at libxml2. Is this the way to go? Have I missed an important option, bearing in mind the five requirements above? I don't mind using a (good) c-library like libxml2, but a c++ interface would be nice. (I don't like Xerces-C's API very much.)
I am willing to bend on the SAX2 requirement if comparable functionality is available.
Having spent a goodly amount of time on this same problem, it was my conclusion that libxml2 is the best option available under your guidelines. The C interface is not too difficult to use and it's very fast.
There are some other good options for commercial libraries, but most of the other comparable open-source options are either painfully slow or are mired in a deep, annoying vat of dependency soup.
You say you need these things in your project, but don't give any idea of the pipeline. For example, we had a whole load of static XML files which needed to be loaded quickly, but only validated rarely. So validated using a separate process in batch (using RelaxNG as it was human writable markup ) and loaded the XML using expat. The system also used XMPP, so checked streaming input, but that didn't require validating against a schema (partly because it was streamed, and mostly because most of the possible errors were not expressible in a schema).
If you need a whole host of other facilities, you can consider Qt, which has good XML support. Be warned though, it's WAY more than an XML processing library; it's a full blown application framework with support for GUIs, networking and a whole host of other things.
Qt
You can also try Poco. It's another application framework, but not as huge as Qt (i.e. no GUI-related things etc.)
Poco
Lastly, if you don't mind a C library, you can use Expat. It's not SAX per se, but writing code using Expat is somewhat like SAX. It has C++ wrappers, but they're not officially part of the project IIRC, and may not be as well-maintained or designed. I'm not too sure though.
Expat
Hope this helps!
EDIT: I misread your original post: not too sure about the validation features of these libraries, I've never used them before.

What Linux Full Text Indexing Tool Has A Good C++ API?

I'm looking to add full text indexing to a Linux desktop application written in C++. I am thinking that the easiest way to do this would be to call an existing library or utility. This article reviews various open source utilities available for the Gnome and KDE desktops; metatracker, recoll and stigi are all written in C++ so they each seem reasonable. But I cannot find any notable documentation on how to use them as libraries or through an API. I could, instead, use something like Clucene or Xapian, which are generic full text indexing libraries. They seem more straightforward but if I used them, I'd have to implement my own indexing daemon, an unappealing prospect.
Also, Xesam seems to be the latest thing, does anyone have any evidence that it works?
So, does anyone have experience using any of the applications or libraries? How did you use it and what documentation was useful?
I used CLucene, which you mentioned (and also Lucene.NET), and found it to be pretty good.
There's also Strigi which AFAIK works with Xesam and is the default used in KDE.
After further looking around, I found and worked with Recol. It believe that it has the best C++ interface to a full text search engine, in this case Xapian.
It is important to realize that clucene and Xapian are both highly complex libraries designed primarily for multi-user server applications. Cutting them down to a level appropriate for a client-system is not easy. If I remember correctly, Strigi has a complex, pure C interface which isn't adapted.
Clucene also doesn't seem to be that actively maintained currently and Xapian seems to be maintained. But the thing is the existence of recol, which allows you to index particular files without the massive, massive setup that raw Xapian or clucene requires - creating your own "stemming" set is not normally desirable, etc.

good combination of a c++ toolkit/library, cross platform db (not necessarily sql)

what do you suggest as a cross platform "almost all encompassing" abstraction toolkit/library, not necessarily gui oriented?
the project should at some point include an extremely minimal web server and a "db" of some sort (basically to have indexes/btrees, maybe relations, so a rdbms is desiderable but avoidable if necessarily, sql might be overkill)
i was thinking about qt, boost, tokyo cabinet and/or sqlite; what else? what is "best suited"?
i would like to keep platform customization and overall execution footprint at minimum...
thank you in advance
For a minimal webserver, I think you're fine using Boost.Asio and sqlite -- it's quite portable, and should have everything you need. Remember that the C/C++ runtimes also provide portable abstractions for many things, so be sure to check those first (especially if a minimum overhead is required -- it might be simply easier to use C runtime functions than Boost.Filesystem).
You can also look at Firebird as a cross platform database
You should definitely take a look at Poco.
For my own similar purposes I use mongoose for web serving and sqlite for the database. Both are very high-quality products, but unfortunately are written in C. However, they are very simple to embed in C++ applications, and I have written simple C++ wrappers for both of them.

What is the best way to communicate with a MySQL server?

I am going to be using C/C++, and would like to know the best way to talk to a MySQL server. Should I use the library that comes with the server installation? Are they any good libraries I should consider other than the official one?
MySQL++
That depends a bit on what you want to do.
First, check out libraries that provide connectivity to more than on DBMS platform. For example, Qt makes it very easy to connect to MySQL, MS SQL Server and a bunch of others, and change the database driver (connection type) at runtime - with just a few lines of code.
MySQL-specific libraries are fine, but bear in mind that you are locking yourself down to one DB implementation - if you ever need to change in the future it's gonna be a whole lot of work - even if you design your code such that the DB-specific stuff is behind a facade. Why not use a library that provides connectivity to multiple platforms, and save yourself the trouble?
OTL is a solid cross-DBMS solution for C++ that my project has been using for years. We use it to talk to SQL Server (via ODBC) and Oracle (via OCI). It's fairly easy to drive, and comes with a large number of examples across all the supported databases.
There is nothing wrong with MySQL's own client-libraries. If you are willing to settle for reduced functionality, you can buy yourself some extra portability by using ODBC, UDBC, apr_dbd, or some other database-abstraction library (such as the OTL offered already).
This will make switching a back-end easier, but, as I mentioned, at the expense of offering less functionality compared to that of the native client. Because DB-vendors differ, the abstraction-libraries can only really offer the functions common to all (or most) of the back-ends. Whether you prefer to optimize for a particular DB or would rather make it easier to switch back-ends, is up to you (and, perhaps, your manager).