Simultaneously Matching Multiple Regular Expressions with Google RE2 - c++

I'm attempting to match many (500+) regular expressions quickly using Google's RE2 Library, as I'd like to get similar results to this whitepaper. I'd like to use RE2-m on page 13.
From what I've seen online, the Set interface is the way to go, though I'm unsure where to get started -- I haven't been able to find Google RE2 tutorials using the set interface online. Could someone please point me in the right direction?

Just implemented this today for something I'm working on, here is a snippet for the use of future readers.
The right class to handle this using RE2 is RE2::Set, you can find the code here.
Here is an example:
std::vector<std::string> kRegexExpressions = {
R"My name is [\w]+",
R"His number is [\d]+",
};
RE2::Set regex_set(RE2::DefaultOptions, RE2::UNANCHORED);
for (const auto &exp : kRegexExpressions) {
int index = regex_set.Add(exp, &err);
if (index < 0) {
<report-error>
return;
}
}
if (!regex_set.Compile()) {
<report-error>
return;
}
std::vector<int> matching_rules;
if (!regex_set_.Match(line, &matching_rules)) {
<no-match>
return;
}
for (auto rule_index : matching_rules) {
std::cout << "MATCH: Rule #" << rule_index << ": " << kRegexExpressions << std::endl;
}

Related

Is there a way to list more than 1000 buckets using aws sdk C++?

The AWS SDK page shows this example:
Aws::S3::S3Client s3_client;
Aws::S3::Model::ListBucketsOutcome outcome = s3_client.ListBuckets();
This, however, allows returning up to 1000 buckets ONLY!
In our organization we have more than 1k buckets.
boto3 or java interface using ECS allows me to do pagination.
I can find NOTHING, however, for C++ and I was already digging in the dark corners of the Internet.
Anyone has any idea how to do that pagination in C++ since ListBuckets() does not get any request as the argument?
NOTE: I am not looking for workarounds like executing a boto script or jni within my C++ to solve that list buckets issue. I am interested to find a proper way to use SDK, which I personally, for unknown reason to me does not exist
I was having the same problem and I came across a solution for java here (https://stackoverflow.com/a/15352712/1856251) There are a number of solution there but the one in the link showed me the direction I needed.
It seems that for listing objects there is the concept of a Marker. This marker refers to the key to start with when listing. When it is empty the first key in the bucket. If you set this to the last key returned from the previous call to ListObjects then the next call will start from there. To do that you can call the member function.
void Aws::S3::Model::ListObjectsRequest::SetMarker( Aws::String && value)
More info can be found:
https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_s3_1_1_model_1_1_list_objects_request.html#a72bef4f7da7f91661a7642da7dc3aa36
Here is the code.
void myListObjects(const Aws::String& bucketName,
const Aws::String& region)
{
Aws::Client::ClientConfiguration config;
if (!region.empty())
{
config.region = region;
}
Aws::S3::S3Client s3_client(config);
Aws::S3::Model::ListObjectsRequest request;
request.WithBucket(bucketName);
std::string prelastMarker, lastMarker;
std::cout << "Objects in bucket '" << bucketName << "': "
<< std::endl << std::endl;
do {
auto outcome = s3_client.ListObjects(request);
prelastMarker = lastMarker;
if (outcome.IsSuccess())
{
Aws::Vector<Aws::S3::Model::Object> keyList =
outcome.GetResult().GetContents();
for (Aws::S3::Model::Object& object : keyList)
{
std::cout << object.GetKey() << std::endl;
}
lastMarker = keyList[keyList.size() - 1].GetKey();//outcome.GetResult().GetNextMarker();
request.SetMarker(lastMarker.c_str());
outcome = s3_client.ListObjects(request);
}
} while (prelastMarker != lastMarker);
}

Simplification of groovy substring call with null argument

I'm using some archaic version control, and am trying to parse the comments using Groovy for Jenkins usage.
I managed to get the Groovy code working, but was wondering if there might be a way to simplify the code.
The test string which I use is as follows:
text = "IL!21234 12/3/18 3:46 PM user_d Some comments with new\nlines interspersed and pointers to commits\n(IL!1234)\nIL!1234 1/2/17 2:46 AM user_x Some other commit\n"
The expected result is:
tasks = ["IL!21234 12/3/18 3:46 PM user_d Some comments with new lines interspersed and pointers to commits (IL!1234)", "IL!1234 1/2/17 2:46 AM user_x Some other commit"]
Here follows my code:
m = (text =~ /IL!\d+ \d{1,2}\/\d{1,2}\/\d{1,2} \d{1,2}:\d{1,2} [AP]M [a-z_]* .*/)
matchStarts = []
tasks = []
while(m.find()) {
matchStarts << m.start()
}
matchStarts.eachWithIndex { matchStart, index ->
if (matchStarts[index + 1]) {
tasks << text.substring(matchStart, matchStarts[index + 1]).replace("\n", " ").trim()
} else {
tasks << text.substring(matchStart).replace("\n", " ").trim()
}
}
This works, but I was wondering if there would be a nicer way to deal with the if/else.
Thanks,

Creating json string using json lib

I am using jsonc-libjson to create a json string like below.
{ "author-details": {
"name" : "Joys of Programming",
"Number of Posts" : 10
}
}
My code looks like below
json_object *jobj = json_object_new_object();
json_object *jStr1 = json_object_new_string("Joys of Programming");
json_object *jstr2 = json_object_new_int("10");
json_object_object_add(jobj,"name", jStr1 );
json_object_object_add(jobj,"Number of Posts", jstr2 );
this gives me json string
{
"name" : "Joys of Programming",
"Number of Posts" : 10
}
How do I add the top part associated with author details?
To paraphrase an old advertisement, "libjson users would rather fight than switch."
At least I assume you must like fighting with the library. Using nlohmann's JSON library, you could use code like this:
nlohmann::json j {
{ "author-details", {
{ "name", "Joys of Programming" },
{ "Number of Posts", 10 }
}
}
};
At least to me, this seems somewhat simpler and more readable.
Parsing is about equally straightforward. For example, let's assume we had a file named somefile.json that contained the JSON data shown above. To read and parse it, we could do something like this:
nlohmann::json j;
std::ifstream in("somefile.json");
in >> j; // Read the file and parse it into a json object
// Let's start by retrieving and printing the name.
std::cout << j["author-details"]["name"];
Or, let's assume we found a post, so we want to increment the count of posts. This is one place that things get...less tasteful--we can't increment the value as directly as we'd like; we have to obtain the value, add one, then assign the result (like we would in lesser languages that lack ++):
j["author-details"]["Number of Posts"] = j["author-details"]["Number of Posts"] + 1;
Then we want to write out the result. If we want it "dense" (e.g., we're going to transmit it over a network for some other machine to read it) we can just use <<:
somestream << j;
On the other hand, we might want to pretty-print it so a person can read it more easily. The library respects the width we set with setw, so to have it print out indented with 4-column tab stops, we can do:
somestream << std::setw(4) << j;
Create a new JSON object and add the one you already created as a child.
Just insert code like this after what you've already written:
json_object* root = json_object_new_object();
json_object_object_add(root, "author-details", jobj); // This is the same "jobj" as original code snippet.
Based on the comment from Dominic, I was able to figure out the correct answer.
json_object *jobj = json_object_new_object();
json_object* root = json_object_new_object();
json_object_object_add(jobj, "author-details", root);
json_object *jStr1 = json_object_new_string("Joys of Programming");
json_object *jstr2 = json_object_new_int(10);
json_object_object_add(root,"name", jStr1 );
json_object_object_add(root,"Number of Posts", jstr2 );

Can you change a bsoncxx object (document/value/element)?

I'm using the mongocxx driver and I am considering keeping the query results given in BSON as a data holder in a couple of objects instead of parsing the BSON to retrieve the values and then discard it.
This would make some sense "if" I can edit the BSON on the fly. I couldn't find anything in the bsoncxx driver documentation besides the builder that would allow me to manipulate a bsoncxx document/value/view/element after it's been constructed.
As an example, imagine that I have something like this
fruit["orange"];
where fruit is a bsoncxx::document::element
I can get the value by using one of the .get_xxx operators.
What I can't find is something like
fruit["orange"] = "ripe";
Is there a way of doing this, or the idea behind the builder is "just" to create a query to give to the database?
There was a question with same theme, see here
So, bsoncxx objects seem to be immutable, and we have to re-create them if we need to edit them.. :(
I've written a really bad solution which re-creates document from scratch
But this is a solution, I guess.
std::string bsoncxx_string_viewToString(core::v1::string_view gotStringView) {
std::stringstream convertingStream;
convertingStream << gotStringView;
return std::move(convertingStream.str());
}
std::string b_utf8ToString(bsoncxx::types::b_utf8 gotB_utf8) {
return std::move(bsoncxx_string_viewToString(core::v1::string_view(gotB_utf8)));
}
template <typename T>
bsoncxx::document::value editBsoncxx(bsoncxx::document::view documentToEdit, std::string keyToEdit, T newValue, bool appendValueIfKeyNotExist = true) {
auto doc = bsoncxx::builder::stream::document{};
std::string currentKey;
for (auto i : documentToEdit) {
currentKey = bsoncxx_string_viewToString(i.key());
if (currentKey == keyToEdit) {
doc << keyToEdit << newValue;
appendValueIfKeyNotExist = false;
} else {
doc << currentKey << i.get_value();
}
}
if (appendValueIfKeyNotExist) // Maybe this would be better with documentToEdit.find(key), but I don't know how to check if iterator is past-the-end
//If there is a way to check if bsoncxx contains key, we can achieve ~o(log(n)) [depending on 'find key' implementation] which is better than o(n)
doc << keyToEdit << newValue;
return doc.extract();
}
Usage:
auto doc = document{} << "foo0" << "bar0" << "foo1" << 1 << "foo2" << 314 << finalize;
std::cout << bsoncxx::to_json(doc) << std::endl << std::endl;
doc = editBsoncxx<std::string> (doc.view(), "foo1", "edited"); //replace "foo1" with string "edited"
doc = editBsoncxx<int>(doc.view(), "baz_noappend", 123, false); //do nothing if key "baz_noappend" is not found. <- if key-existance algorythm will be applied, we'd spend about o(lob(n)) here, not o(n)
doc = editBsoncxx<int>(doc.view(), "baz_append", 123, true); //key will not be found => it'll be appended which is default behaviour
std::cout << bsoncxx::to_json(doc) << std::endl;
Result:
{
"foo0" : "bar0",
"foo1" : 1,
"foo2" : 314
}
{
"foo0" : "bar0",
"foo1" : "edited",
"foo2" : 314,
"baz_append" : 123
}
So, in your case you can use
fruit = editBsoncxx<std::string>(fruit.view(), "orange", "ripe");
But, again, see already-mentioned related question you're right when saying that
the idea behind the builder is "just" to create a query to give to the database?
I think, the solution will be "do not edit documents".
also you can write something like type-converter from bsoncxx to other json storing fomat (for example, rapidjson)
Beware of {value:"valid_json"}: bsoncxx::to_json does not add backslashes to quote signs in values => injection can be made.

Caffe C++ save network caffemodel file

I have successfully built and trained an audioCaffe demo, but the demo doesn't save the network.
I have found documentation for saving the network in Python in MatLab, but I can't find any documentation on C++.
I would think there would be a similar function like net.save("file.caffemodel") but I tried that and it didn't work.
In the train function in caffe.cpp I tried this:
if (FLAGS_snapshot.size()) {
LOG(INFO) << "Resuming from " << FLAGS_snapshot;
solver.Solve(FLAGS_snapshot);
} else if (FLAGS_weights.size()) {
LOG(INFO) << "Finetuning from " << FLAGS_weights;
solver.net()->CopyTrainedLayersFrom(FLAGS_weights);
solver.Solve();
} else {
solver.Solve();
}
solver.save("file.caffemodel")
But I got a no method exists error
Any ideas?
Please try this...
caffe::NetParameter net_param;
net_->ToProto(&net_param);
caffe::WriteProtoToBinaryFile(net_param, caffe_model_path);
You should look at Snapshot() and SnapshotToBinaryProto() - src/caffe/solver.cpp.
Caller code is in Solver::Step:
// Save a snapshot if needed.
if ((param_.snapshot()
&& iter_ % param_.snapshot() == 0
&& Caffe::root_solver()) ||
(request == SolverAction::SNAPSHOT)) {
Snapshot();
}