Can you change a bsoncxx object (document/value/element)? - c++

I'm using the mongocxx driver and I am considering keeping the query results given in BSON as a data holder in a couple of objects instead of parsing the BSON to retrieve the values and then discard it.
This would make some sense "if" I can edit the BSON on the fly. I couldn't find anything in the bsoncxx driver documentation besides the builder that would allow me to manipulate a bsoncxx document/value/view/element after it's been constructed.
As an example, imagine that I have something like this
fruit["orange"];
where fruit is a bsoncxx::document::element
I can get the value by using one of the .get_xxx operators.
What I can't find is something like
fruit["orange"] = "ripe";
Is there a way of doing this, or the idea behind the builder is "just" to create a query to give to the database?

There was a question with same theme, see here
So, bsoncxx objects seem to be immutable, and we have to re-create them if we need to edit them.. :(
I've written a really bad solution which re-creates document from scratch
But this is a solution, I guess.
std::string bsoncxx_string_viewToString(core::v1::string_view gotStringView) {
std::stringstream convertingStream;
convertingStream << gotStringView;
return std::move(convertingStream.str());
}
std::string b_utf8ToString(bsoncxx::types::b_utf8 gotB_utf8) {
return std::move(bsoncxx_string_viewToString(core::v1::string_view(gotB_utf8)));
}
template <typename T>
bsoncxx::document::value editBsoncxx(bsoncxx::document::view documentToEdit, std::string keyToEdit, T newValue, bool appendValueIfKeyNotExist = true) {
auto doc = bsoncxx::builder::stream::document{};
std::string currentKey;
for (auto i : documentToEdit) {
currentKey = bsoncxx_string_viewToString(i.key());
if (currentKey == keyToEdit) {
doc << keyToEdit << newValue;
appendValueIfKeyNotExist = false;
} else {
doc << currentKey << i.get_value();
}
}
if (appendValueIfKeyNotExist) // Maybe this would be better with documentToEdit.find(key), but I don't know how to check if iterator is past-the-end
//If there is a way to check if bsoncxx contains key, we can achieve ~o(log(n)) [depending on 'find key' implementation] which is better than o(n)
doc << keyToEdit << newValue;
return doc.extract();
}
Usage:
auto doc = document{} << "foo0" << "bar0" << "foo1" << 1 << "foo2" << 314 << finalize;
std::cout << bsoncxx::to_json(doc) << std::endl << std::endl;
doc = editBsoncxx<std::string> (doc.view(), "foo1", "edited"); //replace "foo1" with string "edited"
doc = editBsoncxx<int>(doc.view(), "baz_noappend", 123, false); //do nothing if key "baz_noappend" is not found. <- if key-existance algorythm will be applied, we'd spend about o(lob(n)) here, not o(n)
doc = editBsoncxx<int>(doc.view(), "baz_append", 123, true); //key will not be found => it'll be appended which is default behaviour
std::cout << bsoncxx::to_json(doc) << std::endl;
Result:
{
"foo0" : "bar0",
"foo1" : 1,
"foo2" : 314
}
{
"foo0" : "bar0",
"foo1" : "edited",
"foo2" : 314,
"baz_append" : 123
}
So, in your case you can use
fruit = editBsoncxx<std::string>(fruit.view(), "orange", "ripe");
But, again, see already-mentioned related question you're right when saying that
the idea behind the builder is "just" to create a query to give to the database?
I think, the solution will be "do not edit documents".
also you can write something like type-converter from bsoncxx to other json storing fomat (for example, rapidjson)
Beware of {value:"valid_json"}: bsoncxx::to_json does not add backslashes to quote signs in values => injection can be made.

Related

Simultaneously Matching Multiple Regular Expressions with Google RE2

I'm attempting to match many (500+) regular expressions quickly using Google's RE2 Library, as I'd like to get similar results to this whitepaper. I'd like to use RE2-m on page 13.
From what I've seen online, the Set interface is the way to go, though I'm unsure where to get started -- I haven't been able to find Google RE2 tutorials using the set interface online. Could someone please point me in the right direction?
Just implemented this today for something I'm working on, here is a snippet for the use of future readers.
The right class to handle this using RE2 is RE2::Set, you can find the code here.
Here is an example:
std::vector<std::string> kRegexExpressions = {
R"My name is [\w]+",
R"His number is [\d]+",
};
RE2::Set regex_set(RE2::DefaultOptions, RE2::UNANCHORED);
for (const auto &exp : kRegexExpressions) {
int index = regex_set.Add(exp, &err);
if (index < 0) {
<report-error>
return;
}
}
if (!regex_set.Compile()) {
<report-error>
return;
}
std::vector<int> matching_rules;
if (!regex_set_.Match(line, &matching_rules)) {
<no-match>
return;
}
for (auto rule_index : matching_rules) {
std::cout << "MATCH: Rule #" << rule_index << ": " << kRegexExpressions << std::endl;
}

findAndGetString() in DCMTK returns null for the tag

I am developing a quick DICOM viewer using DCMTK library and I am following the example provided in this link.
The buffer from the API always returns null for any tag ID, eg: DCM_PatientName.
But the findAndGetOFString() API works fine but returns only the first character of the tag in ASCII, is this how this API should work?
Can someone let me know why the buffer is empty the former API?
Also the DicomImage API also the same issue.
Snippet 1:
DcmFileFormat fileformat;
OFCondition status = fileformat.loadFile(test_data_file_path.toStdString().c_str());
if (status.good())
{
OFString patientName;
char* name;
if (fileformat.getDataset()->findAndGetOFString(DCM_PatientName, patientName).good())
{
name = new char[patientName.length()];
strcpy(name, patientName.c_str());
}
else
{
qDebug() << "Error: cannot access Patient's Name!";
}
}
else
{
qDebug() << "Error: cannot read DICOM file (" << status.text() << ")";
}
In the above snippet name has the ASCII value "50" and the actual name is "PATIENT".
Snippet 2:
DcmFileFormat file_format;
OFCondition status = file_format.loadFile(test_data_file_path.toStdString().c_str());
std::shared_ptr<DcmDataset> dataset(file_format.getDataset());
qDebug() << "\nInformation extracted from DICOM file: \n";
const char* buffer = nullptr;
DcmTagKey key = DCM_PatientName;
dataset->findAndGetString(key,buffer);
std::string tag_value = buffer;
qDebug() << "Patient name: " << tag_value.c_str();
In the above snippet, the buffer is null. It doesn't read the name.
NOTE:
This is only a sample. I am just playing around the APIs for learning
purpose.
The following sample method reads the patient name from a DcmDataset object:
std::string getPatientName(DcmDataset& dataset)
{
// Get the tag's value in ofstring
OFString ofstring;
OFCondition condition = dataset.findAndGetOFString(DCM_PatientName, ofstring);
if(condition.good())
{
// Tag found. Put it in a std::string and return it
return std::string(ofstring.c_str());
}
// Tag not found
return ""; // or throw if you need the tag
}
I have tried your code with your datasets. I just replaced the output to QT console classes to std::cout. It works for me - i.e. it prints the correct patient name (e.g. "PATIENT2" for scan2.dcm). Everything seems correct, except for the fact that you apparently want to transfer the ownership for the dataset to a smart pointer.
To obtain the ownership for the DcmDataset from the DcmFileFormat, you must call getAndRemoveDataset() instead of getDataset(). However, I do not think that your issue is related that. You may want to try my modified snippet:
DcmFileFormat file_format;
OFCondition status = file_format.loadFile("d:\\temp\\StackOverflow\\scan2.dcm");
std::shared_ptr<DcmDataset> dataset(file_format.getAndRemoveDataset());
std::cout << "\nInformation extracted from DICOM file: \n";
const char* buffer = nullptr;
DcmTagKey key = DCM_PatientName;
dataset->findAndGetString(key, buffer);
std::string tag_value = buffer;
std::cout << "Patient name: " << tag_value.c_str();
It probably helps you to know that your code and the dcmtk methods you use are correct, but that does not solve your problem. Another thing I would recommend is to verify the result returned by file_format.loadFile(). Maybe there is a surprise in there.
Not sure if I can help you more, but my next step would be to verify your build environment, e.g. the options that you use for building dcmtk. Are you using CMake to build dcmtk?

Pretty printing XML in wxWidgets

I'm writing a class derived from wxStyledTextCtrl and I want it to prettify given XML without adding anything other than whitespaces. I cannot find simple working solution. I can only use wxStyledTextCtrl, wxXmlDocument and libxml2.
The result I'm aiming for is that after calling SetText with wxString containing following text
<!-- comment1 --> <!-- comment2 --> <node><emptynode/> <othernode>value</othernode></node>
the control should show
<!-- comment1 -->
<!-- comment2 -->
<node>
<emptynode/>
<othernode>value</othernode>
</node>
using libxml2 I managed to almost achieve this, but it also prints XML declaration (eg. <?xml version="1.0" encoding="UTF-8"?>) and I don't want this.
inb4, I'm looking for simple and clean solution - i don't want to manually remove first line of formatted XML
Is there any simple solution to this using given tools? I feel like I'm missing something.
Is there a simple solution? No. But if you want to write you're own pretty print function, you basically need to make a depth first iteration over the xml document tree, printing it as you go. There's a slight complication in that you also need some way of knowing when to close a tag.
Here's an incomplete example of one way to do this using only wxWidgets xml classes. Currently, it doesn't handle attributes, self closing elements (such as '' in your sample text), or any other special element types. A complete pretty printer would need to add those things.
#include <stack>
#include <set>
#include <wx/xml/xml.h>
#include <wx/sstream.h>
wxString PrettyPrint(const wxString& in)
{
wxStringInputStream string_stream(in);
wxXmlDocument doc(string_stream);
wxString pretty_print;
if (doc.IsOk())
{
std::stack<wxXmlNode*> nodes_in_progress;
std::set<wxXmlNode*> visited_nodes;
nodes_in_progress.push(doc.GetDocumentNode());
while (!nodes_in_progress.empty())
{
wxXmlNode* cur_node = nodes_in_progress.top();
nodes_in_progress.pop();
int depth = cur_node->GetDepth();
for (int i=1;i<depth;++i)
{
pretty_print << "\t";
}
if (visited_nodes.find(cur_node)!=visited_nodes.end())
{
pretty_print << "</" << cur_node->GetName() << ">\n";
}
else if ( !cur_node->GetNodeContent().IsEmpty() )
{
//If the node has content, just print it now
pretty_print << "<" << cur_node->GetName() << ">";
pretty_print << cur_node->GetNodeContent() ;
pretty_print << "</" << cur_node->GetName() << ">\n";
}
else if (cur_node==doc.GetDocumentNode())
{
std::stack<wxXmlNode *> nodes_to_add;
wxXmlNode *child = cur_node->GetChildren();
while (child)
{
nodes_to_add.push(child);
child = child->GetNext();
}
while (!nodes_to_add.empty())
{
nodes_in_progress.push(nodes_to_add.top());
nodes_to_add.pop();
}
}
else if (cur_node->GetType()==wxXML_COMMENT_NODE)
{
pretty_print << "<!-- " << cur_node->GetContent() << " -->\n";
}
//insert checks for other types of nodes with special
//printing requirements here
else
{
//otherwise, mark the node as visited and then put it back
visited_nodes.insert(cur_node);
nodes_in_progress.push(cur_node);
//If we push the children in order, they'll be popped
//in reverse order.
std::stack<wxXmlNode *> nodes_to_add;
wxXmlNode *child = cur_node->GetChildren();
while (child)
{
nodes_to_add.push(child);
child = child->GetNext();
}
while (!nodes_to_add.empty())
{
nodes_in_progress.push(nodes_to_add.top());
nodes_to_add.pop();
}
pretty_print <<"<" << cur_node->GetName() << ">\n";
}
}
}
return pretty_print;
}

Creating json string using json lib

I am using jsonc-libjson to create a json string like below.
{ "author-details": {
"name" : "Joys of Programming",
"Number of Posts" : 10
}
}
My code looks like below
json_object *jobj = json_object_new_object();
json_object *jStr1 = json_object_new_string("Joys of Programming");
json_object *jstr2 = json_object_new_int("10");
json_object_object_add(jobj,"name", jStr1 );
json_object_object_add(jobj,"Number of Posts", jstr2 );
this gives me json string
{
"name" : "Joys of Programming",
"Number of Posts" : 10
}
How do I add the top part associated with author details?
To paraphrase an old advertisement, "libjson users would rather fight than switch."
At least I assume you must like fighting with the library. Using nlohmann's JSON library, you could use code like this:
nlohmann::json j {
{ "author-details", {
{ "name", "Joys of Programming" },
{ "Number of Posts", 10 }
}
}
};
At least to me, this seems somewhat simpler and more readable.
Parsing is about equally straightforward. For example, let's assume we had a file named somefile.json that contained the JSON data shown above. To read and parse it, we could do something like this:
nlohmann::json j;
std::ifstream in("somefile.json");
in >> j; // Read the file and parse it into a json object
// Let's start by retrieving and printing the name.
std::cout << j["author-details"]["name"];
Or, let's assume we found a post, so we want to increment the count of posts. This is one place that things get...less tasteful--we can't increment the value as directly as we'd like; we have to obtain the value, add one, then assign the result (like we would in lesser languages that lack ++):
j["author-details"]["Number of Posts"] = j["author-details"]["Number of Posts"] + 1;
Then we want to write out the result. If we want it "dense" (e.g., we're going to transmit it over a network for some other machine to read it) we can just use <<:
somestream << j;
On the other hand, we might want to pretty-print it so a person can read it more easily. The library respects the width we set with setw, so to have it print out indented with 4-column tab stops, we can do:
somestream << std::setw(4) << j;
Create a new JSON object and add the one you already created as a child.
Just insert code like this after what you've already written:
json_object* root = json_object_new_object();
json_object_object_add(root, "author-details", jobj); // This is the same "jobj" as original code snippet.
Based on the comment from Dominic, I was able to figure out the correct answer.
json_object *jobj = json_object_new_object();
json_object* root = json_object_new_object();
json_object_object_add(jobj, "author-details", root);
json_object *jStr1 = json_object_new_string("Joys of Programming");
json_object *jstr2 = json_object_new_int(10);
json_object_object_add(root,"name", jStr1 );
json_object_object_add(root,"Number of Posts", jstr2 );

Why Isn't find_one working in MongoDB C++?

I have a MongoDB 3.0.7 database, created with the mongo shell. The following works fine:
% mongo test
> vs = db.myCollection.findOne({"somefield.subfield": "somevalue"})
but when I do this in C++:
mongocxx::instance inst{};
mongocxx::client conn{};
auto db = conn["test"];
bsoncxx::stdx::optional< bsoncxx::document::value> docObj;
try {
docObj =
db["myCollection"]
.find_one(document{} <<
"somefield.subfield" << "someValue" <<
bsoncxx::builder::stream::finalize);
} catch (mongocxx::exception::operation e) {
std::cerr << "Retrieval failed (and exception thrown)";
}
if (docObj == bsoncxx::stdx::nullopt)
std::cerr << "Failed to find object";
I get "Failed to find object". What am I missing here?
Update: 11/23/2015, 10:00
I've installed the latest cxx driver (0.3.0), and made the following changes:
mongocxx::instance inst{};
mongocxx::client *connPtr;
bsoncxx::stdx::string_view connectionString("mongodb://localhost");
connPtr = new mongocxx::client(mongocxx::uri(connectionString));
auto db = connPtr->database("test");;
bsoncxx::stdx::optional< bsoncxx::document::value> docObj;
try {
docObj =
db["myCollection"]
.find_one(document{} <<
"somefield.subfield" << "someValue" <<
bsoncxx::builder::stream::finalize);
} catch (mongocxx::exception::operation e) {
std::cerr << "Retrieval failed (and exception thrown)";
}
if (docObj == bsoncxx::stdx::nullopt)
std::cerr << "Failed to find object";
I'm back to exactly the same thing. Calling db.list_collections(document{}) retrieves no results.
The bsoncxx library has two document types, views and values. A document::value contains the actual document data, and a document::view is just a reference to some underlying value. Values must outlive the views that use them.
There's a bug in the new c++11 driver with how document::values are passed around. This code produces a document::value :
document{} << "someField" << "someValue" << finalize;
The collection.find_one() method takes a document::view, and document::values convert implicitly to document::views. Unfortunately, this means if you dynamically build a document in your call to find_one(), as above, you can shoot yourself in the foot:
collection.find_one(document{} << "someField" << "someValue" << finalize);
finalize makes a temporary document::value, then find_one converts that to a document::view. The temporary value is dropped on the floor, leaving your view value-less, like a dangling pointer.
A workaround is to make your value in a separate call, and keep it around:
document::value doc = document{} << "someField" << "someValue" << finalize;
collection.find_one(doc.view());
I suspect this is what's causing your queries to fail; if not, it's something to make your code resilient to nonetheless!
You can track this ticket for the real fix for this problem.