mongodb upsert doesn't update if locked - c++

I have an app written in C++ with 16 threads which reads from the output of wireshark/tshark. Wireshark/tshark dissects pcap files which are gsm_map signalling captures.
Mongodb is 2.6.7
The structure I need for my documents are like this:
Note "packet" is an array, it will become apparent why later.
For all who don't know TCAP, the TCAP layer is transaction-oriented, this means, all packets include:
Transaction State: begin/continue/end
Origin transaction ID (otid)
Destination transaction ID (dtid)
So for instance, you might see a transaction comprising 3 packets, which looking at the TCAP layer would be roughly this
Two packets, one "begin", one "end".
{
"_id" : ObjectId("54ccd186b8ea19c89ee8f231"),
"deleted" : "0",
"packet" : {
"datetime" : ISODate("2015-01-31T12:58:11.939Z"),
"signallingType" : "M2PA",
"opc" : "326",
"dpc" : "6406",
"transState" : "begin",
"otid" : "M2PA0400435B",
"dtid" : "",
"sccpCalling" : "523332075100",
"sccpCalled" : "523331466304",
"operation" : "mo-forwardSM (46)",
...
}
}
/* 1 */
{
"_id" : ObjectId("54ccd1a1b8ea19c89ee8f7c5"),
"deleted" : "0",
"packet" : {
"datetime" : ISODate("2015-01-31T12:58:16.788Z"),
"signallingType" : "M2PA",
"opc" : "6407",
"dpc" : "326",
"transState" : "end",
"otid" : "",
"dtid" : "M2PA0400435B",
"sccpCalling" : "523331466304",
"sccpCalled" : "523332075100",
"operation" : "Not Found",
...
}
}
Because of the network architecture, we're tracing in two (2) points, and the traffic is balanced amongst these two points. This means sometimes we see "continue"s or "end"s BEFORE a "begin". Conversely, we might see a "continue" BEFORE a "begin" or "end". In short, transactions are not ordered.
Moreover, multiple end-points are "talking" amongst themselves, and transactionIDs might get duplicated, 2 endpoints could be using the same tid and other 2 endpoints at the same time, though this doesn't happen all the time, it does happen.
Because of the later, I also need to use the SCCP layer's "calling" and "called" Global titles (like phone numbers).
Bear in mind that I don't know which way a given packet is going, so this is what I'm doing:
Whenever I get a new packet I must find whether the transaction already exists in mongodb, I'm using upsert to do this.
I do this by searching the current's packet otid or dtid in either otid or dtid of existing packets
If it does: push the new packet into the existing document.
If it doesn't: create a new document with the packet.
As an example, this is a upsert for an "end" which should find a "begin":
db.runCommand(
{
update: "packets",
updates:
[
{ q:
{ $and:
[
{
$or: [
{ "packet.otid":
{ $in: [ "M2PA042e3918" ] }
},
{ "packet.dtid":
{ $in: [ "M2PA042e3918" ] }
}
]
},
{
$or: [
{ "packet.sccpCalling":
{ $in: [ "523332075151", "523331466305" ] }
},
{ "packet.sccpCalled":
{ $in: [ "523332075151", "523331466305" ] }
}
]
}
]
},
{
$setOnInsert: {
"unique-id": "422984b6-6688-4782-9ba1-852a9fc6db3b", deleted: "0"
},
$push: {
packet: {
datetime: new Date(1422371239182),
opc: "327", dpc: "6407",
transState: "end",
otid: "", dtid: "M2PA042e3918", sccpCalling: "523332075151", ... }
}
},
upsert: true
}
],
writeConcern: { j: "1" }
}
)
Now, all of this works, until I put it in production.
It seems packets are coming way to fast and I see lots of:
"ClientCursor::staticYield Can't Unlock B/c Of Recursive Lock" Warnings
I read that we can ignore this warning, but I've found that my upserts DO NOT update the documents! It looks like there's a lock and mongodb forgets about the update. If I change the upsert to a simple insert, no packets are lost
I also read this is related to no indexes being used, I have the following index:
"3" : {
"v" : 1,
"key" : {
"packet.otid" : 1,
"packet.dtid" : 1,
"packet.sccpCalling" : 1,
"packet.sccpCalled" : 1
},
"name" : "packet.otid_1_packet.dtid_1_packet.sccpCalling_1_packet.sccpCalled_1",
"ns" : "tracer.packets"
So in conclusion:
1.- If this index is not correct, can someone please help me creating the correct index?
2.- Is it normal that mongo would NOT update a document if it finds a lock?
Thanks and regards!
David

Why are you storing all of the packets in an array? Normally in this kind of situation it's better to make each packet a document on its own; it's hard to say more without more information about your use case (or, perhaps, more knowledge of all these acronyms you're using :D). Your updates would become inserts and you would not need to do the update query. Instead, some other metadata on a packet would join related packets together so you could reconstruct a transaction or whatever you need to do.
More directly addressing your question, I would use an array field tids to store [otid, dtid] and an array field sccps to store [sccpCalling, sccpCalled], which would make your update query look like
{ "tids" : { "$in" : ["M2PA042e3918"] }, "sccps" : { "$in" : [ "523332075151", "523331466305" ] } }
and amenable to the index { "tids" : 1, "sccps" : 1 }.

Related

String array as argument in endpoint result in error code 10 in Mandos tests

The problem
Here's my function :
#[endpoint(registerItem)]
fn register_item(&self, items_id: &[String])
{
// nothing for the moment
}
In my Mandos tests, everything is good (setState, scDeploy, etc..) until I test the call of this endpoint like so :
{
"step": "scCall",
"tx": {
"from": "address:owner",
"to": "sc:equip",
"function": "registerItem",
"arguments": [
"0x70757461696e|0x70757461696e"
],
"gasLimit": "5,000,000",
"gasPrice": "0"
},
"expect": {
"status": "0",
"gas": "*",
"refund": "*"
}
}
When I run it, I got the error code 10 aka execution failed.
This is the entire log :
Output: Scenario: init.scen.json ... FAIL: result code mismatch. Tx . Want: 0. Have: 10 (execution failed). Message: execution failed
Done. Passed: 0. Failed: 1. Skipped: 0.
ERROR: some tests failed
** Things I have tried **
I have replaced the strings array with an int array and I didn't get this problem. I also tried [str] but I got this error :
15 | #[elrond_wasm::derive::contract]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
If it's a endpoint I think you have to use SDK special type for that like ManagedVec so that Node can know how to serialize/deserialize it.
So maybe try this :
#[endpoint(registerItem)]
fn register_item(&self, items_id: ManagedVec<ManagedBuffer>)
{
// nothing for the moment
}

I'am having some problems with boost json in c++

I see that have a lot of questions very similar to mine, but I donĀ“t see any solutions that fit with my problem.
I' am trying to create a JSON with boost library with the below structure:
{
"event_date": "2018-06-11T09:35:48.867Z",
"event_log": "2018-06-11 09:35:43,253 - recycler [TRACE]: Running recycler::WITHDRAW",
"cassettes": [
{
"value" : "0",
"currency": "BRL",
"CDMType" : "WFS_CDM_TYPEREJECTCASSETTE",
"lppPhysical" : [
{
"positionName" : "BIN1A",
"count" : "3"
}
]
},
{.....},{.....}
]
}
Below we will have the code that I have now:
boost::property_tree::ptree children, children2, child, child1, child2, child3, child4, child5, cassettes;
child1.put("value", "cash_unit->ulValues");
child2.put("currency", "std::string(cash_unit->cCurrencyID).substr(0, 3)");
child3.put("CDMType", "cash_unit->usType");
child4.put("lppPhysical.positionName", "ph_unit->lpPhysicalPositionName");
child5.put("lppPhysical.count", "cash_unit->ulCount");
cassettes.put("event_date", "2018-06-11T09:35:48.867Z");
cassettes.put("event_log", "2018-06-11 09:35:43,253 - recycler [TRACE]: Running recycler::WITHDRAW");
children.push_back(std::make_pair("", child1));
children.push_back(std::make_pair("", child2));
children.push_back(std::make_pair("", child3));
children2.push_back(std::make_pair("", child4));
children2.push_back(std::make_pair("", child5));
cassettes.add_child("cassettes", children);
write_json("C:\\Temp\\test.json", cassettes);`
Summarizing, I'm having difficulties to put an array of objects inside of an array of objects.
Finally found a solution to my case, it's pretty simple but was hard to find since I am not too familiar with this library.
//LppPhysical insertion
lppPhysicalInfo.put("positionName", ph_unit->lpPhysicalPositionName);
lppPhysicalInfo.put("count", cash_unit->ulCount);
lppPhysical.push_back(std::make_pair("", lppPhysicalInfo));
//Cassettes insertions
cassettesInfo.put("value", cash_unit->ulValues);
cassettesInfo.put("currency", std::string(cash_unit->cCurrencyID).substr(0, 3));
cassettesInfo.put("CDMType", cash_unit->usType);
cassettesInfo.add_child("lppPhysical", lppPhysical.get_child(""));
cassettes.push_back(std::make_pair("", cassettesInfo));
//External information insertion
arquivo.put("event_date", "DateValue");
arquivo.put("event_log", "LogValue");
arquivo.add_child("cassettes", cassettes);
//Json creator
write_json("C:\\Temp\\assassino.json", arquivo);
On the lppPhysical I just make a pair with all its content and on the Cassettes' insertion I just added the lppPhysical as a child of the cassettes and that's it. Now the lppPhysical is an array of object insde the Cassettes which is also an array of objects

Using Elastic Search to retrieve tag contents and hyphenated words

We have elastic search configured with a whitespace analyzer in our application. The words are tokenized on whitespace, so a name like <fantastic> project is indexed as
["<fantastic>", "project"]
and ABC-123-def project is indexed as
["ABC-123-def", "project"]
When we then search for ABC-* the expected project turns up. But, if we specifically search for <fantastic> it won't show up at all. It's as though Lucene/Elastic Search ignores any search term that includes angle brackets. However, we can search for fantastic, or <*fantastic* or *fantastic* and it finds it fine, even though the word is not indexed separately from the angle brackets.
The standard analyzer tokenizes on any non-alphanumeric character. <fantatsic> project is indexed as
["fantastic", "project"]
and ABC-123-def project is indexed as
["ABC", "123", "def", "project"]
This breaks the ability to search successfully using ABC-123-*. However, what we get with the standard analyzer is that someone can then specifically search for <fantastic> and it returns the desired results.
If instead of a standard analyzer we add a char_filter to the whitespace analyzer that filters out the angle brackets on tags, (replace <(.*)> with $1) it will be indexed thus:
<fantatsic> project is indexed as
["fantastic", "project"]
(no angle brackets). And ABC-123-def project is indexed as
["ABC-123-def", "project"]
It looks promising, but we end up with the same results as for the plain whitespace analyzer: When we search specifically for <fantastic>, we get nothing, but *fantastic* works fine.
Can anyone out on Stack Overflow explain this weirdness?
You could create a tokenizer for special characters, see the following example
{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
},
"analysis" : {
"filter" : {
"custom_filter" : {
"type" : "word_delimiter",
"type_table": ["> => ALPHA", "< => ALPHA"]
}
},
"analyzer" : {
"custom_analyzer" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["lowercase", "custom_filter"]
}
}
}
},
"mappings" : {
"my_type" : {
"properties" : {
"msg" : {
"type" : "string",
"analyzer" : "custom_analyzer"
}
}
}
}
}
<> as ALPHA character causing the underlying word_delimiter to treat them as alphabetic characters.

How to set TTL in mongodb with C++ driver

I want to set TTL index with C++ process in Linux.
But I've found the ensureIndex is removed. (https://github.com/mongodb/mongo-cxx-driver/pull/106)
The argument of createIndex seems only BSONObj can input.
I've tried:
mongo::DBClientConnection mConnection;
mConnection.connect("localhost");
mongo::BSONObj bObj = BSON( "mongo_date"<< 1 << "expireAfterSeconds" << 10);
mConnection.createIndex("Test.Data",bObj)
but the result is:
db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "Test.Data" }
{ "v" : 1, "key" : { "mongo_date" : 1, "expireAfterSeconds" : 10 }, "name" : "mongo_date_1_expireAfterSeconds_10", "ns" : "Test.Data" }
Is there something wrong or other way to set the TTL?
Thanks.
Because I still can't find the method in C, so I use a stupid method temporarily.
I use shell script to create and run a JavaScript
In C code:
int expire = 321;
char expir_char[20];
sprintf(expir_char, "%d",expire);
char temp_char[30] = "./runTtlJs.sh ";
strcat(temp_char,expir_char);
system(temp_char);
In runTtlJs.sh:
echo "db.Data.dropIndex({"mongo_date":1})" > ttl.js
echo "db.Data.ensureIndex({"mongo_date":1}, { expireAfterSeconds: $1 })" >> ttl.js
mongo Test ttl.js
I know it's really not a good answer.

Mongodb - regex match of keys for subdocuments

I have some documents saved in a collection (called urls) that look like this:
{
payload:{
url_google.com:{
url:'google.com',
text:'search'
}
}
},
{
payload:{
url_t.co:{
url:'t.co',
text:'url shortener'
}
}
},
{
payload:{
url_facebook.com:{
url:'facebook.com',
text:'social network'
}
}
}
Using the mongo CLI, is it possible to look for subdocuments of payload that match /^url_/? And, if that's possible, would it also be possible to query on the match's subdocuments (for example, make sure text exists)?
I was thinking something like this:
db.urls.find({"payload":{"$regex":/^url_/}}).count();
But that's returning 0 results.
Any help or suggestions would be great.
Thanks,
Matt
It's not possible to query against document keys in this way. You can search for exact matches using $exists, but you cannot find key names that match a pattern.
I assume (perhaps incorrectly) that you're trying to find documents which have a URL sub-document, and that not all documents will have this? Why not push that type information down a level, something like:
{
payload: {
type: "url",
url: "Facebook.com",
...
}
}
Then you could query like:
db.foo.find({"payload.type": "url", ...})
I would also be remiss if I did not note that you shouldn't use dots (.) is key names in MongoDB. In some cases it's possible to create documents like this, but it will cause great confusions as you attempt to query into embedded documents (where Mongo uses dot as a "path separator" so to speak).
You can do it but you need to use aggregation: Aggregation is pipeline where each stage is applied to each document. You have a wide range of stages to perform various tasks.
I wrote an aggregate pipeline for this specific problem. If you don't need the count but the documents itself you might want to have a look at the $replaceRoot stage.
EDIT: This works only from Mongo v3.4.4 onwards (thanks for the hint #hwase0ng)
db.getCollection('urls').aggregate([
{
// creating a nested array with keys and values
// of the payload subdocument.
// all other fields of the original document
// are removed and only the filed arrayofkeyvalue persists
"$project": {
"arrayofkeyvalue": {
"$objectToArray": "$$ROOT.payload"
}
}
},
{
"$project": {
// extract only the keys of the array
"urlKeys": "$arrayofkeyvalue.k"
}
},
{
// merge all documents
"$group": {
// _id is mandatory and can be set
// in our case to any value
"_id": 1,
// create one big (unfortunately double
// nested) array with the keys
"urls": {
"$push": "$urlKeys"
}
}
},
{
// "explode" the array and create
// one document for each entry
"$unwind": "$urls"
},
{
// "explode" again as the arry
// is nested twice ...
"$unwind": "$urls"
},
{
// now "query" the documents
// with your regex
"$match": {
"urls": {
"$regex": /url_/
}
}
},
{
// finally count the number of
// matched documents
"$count": "count"
}
])