Adding a constant String to regex - regex

String is : 2013-3-03 14:27:33 [main] INFO Main - Start
Corresponding regex is :
/^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) \[(?<thread>.*)\] (?<level>[^\s]+)(?<message>.*)/
which produces result
"time" : "2013-3-03 14:27:33",
"thread" : "main",
"level" : "INFO",
"message" : " Main - Start"
Editing the actual log file is out of my control hence I want to make changes in regex to add some constant. The output which I want is
"time" : "2013-3-03 14:27:33",
"thread" : "main",
"level" : "INFO",
"message" : " Main - Start",
"app" : "abc"
what I have tried is something like below but is there some better way to acheive my requirement ?
/^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) \[(?<thread>.*)\] (?<level>[^\s]+)(?<message>.*)(?<app_abc>)/

Related

Using Elastic Search to retrieve tag contents and hyphenated words

We have elastic search configured with a whitespace analyzer in our application. The words are tokenized on whitespace, so a name like <fantastic> project is indexed as
["<fantastic>", "project"]
and ABC-123-def project is indexed as
["ABC-123-def", "project"]
When we then search for ABC-* the expected project turns up. But, if we specifically search for <fantastic> it won't show up at all. It's as though Lucene/Elastic Search ignores any search term that includes angle brackets. However, we can search for fantastic, or <*fantastic* or *fantastic* and it finds it fine, even though the word is not indexed separately from the angle brackets.
The standard analyzer tokenizes on any non-alphanumeric character. <fantatsic> project is indexed as
["fantastic", "project"]
and ABC-123-def project is indexed as
["ABC", "123", "def", "project"]
This breaks the ability to search successfully using ABC-123-*. However, what we get with the standard analyzer is that someone can then specifically search for <fantastic> and it returns the desired results.
If instead of a standard analyzer we add a char_filter to the whitespace analyzer that filters out the angle brackets on tags, (replace <(.*)> with $1) it will be indexed thus:
<fantatsic> project is indexed as
["fantastic", "project"]
(no angle brackets). And ABC-123-def project is indexed as
["ABC-123-def", "project"]
It looks promising, but we end up with the same results as for the plain whitespace analyzer: When we search specifically for <fantastic>, we get nothing, but *fantastic* works fine.
Can anyone out on Stack Overflow explain this weirdness?
You could create a tokenizer for special characters, see the following example
{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
},
"analysis" : {
"filter" : {
"custom_filter" : {
"type" : "word_delimiter",
"type_table": ["> => ALPHA", "< => ALPHA"]
}
},
"analyzer" : {
"custom_analyzer" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["lowercase", "custom_filter"]
}
}
}
},
"mappings" : {
"my_type" : {
"properties" : {
"msg" : {
"type" : "string",
"analyzer" : "custom_analyzer"
}
}
}
}
}
<> as ALPHA character causing the underlying word_delimiter to treat them as alphabetic characters.

How to set TTL in mongodb with C++ driver

I want to set TTL index with C++ process in Linux.
But I've found the ensureIndex is removed. (https://github.com/mongodb/mongo-cxx-driver/pull/106)
The argument of createIndex seems only BSONObj can input.
I've tried:
mongo::DBClientConnection mConnection;
mConnection.connect("localhost");
mongo::BSONObj bObj = BSON( "mongo_date"<< 1 << "expireAfterSeconds" << 10);
mConnection.createIndex("Test.Data",bObj)
but the result is:
db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "Test.Data" }
{ "v" : 1, "key" : { "mongo_date" : 1, "expireAfterSeconds" : 10 }, "name" : "mongo_date_1_expireAfterSeconds_10", "ns" : "Test.Data" }
Is there something wrong or other way to set the TTL?
Thanks.
Because I still can't find the method in C, so I use a stupid method temporarily.
I use shell script to create and run a JavaScript
In C code:
int expire = 321;
char expir_char[20];
sprintf(expir_char, "%d",expire);
char temp_char[30] = "./runTtlJs.sh ";
strcat(temp_char,expir_char);
system(temp_char);
In runTtlJs.sh:
echo "db.Data.dropIndex({"mongo_date":1})" > ttl.js
echo "db.Data.ensureIndex({"mongo_date":1}, { expireAfterSeconds: $1 })" >> ttl.js
mongo Test ttl.js
I know it's really not a good answer.

mongodb upsert doesn't update if locked

I have an app written in C++ with 16 threads which reads from the output of wireshark/tshark. Wireshark/tshark dissects pcap files which are gsm_map signalling captures.
Mongodb is 2.6.7
The structure I need for my documents are like this:
Note "packet" is an array, it will become apparent why later.
For all who don't know TCAP, the TCAP layer is transaction-oriented, this means, all packets include:
Transaction State: begin/continue/end
Origin transaction ID (otid)
Destination transaction ID (dtid)
So for instance, you might see a transaction comprising 3 packets, which looking at the TCAP layer would be roughly this
Two packets, one "begin", one "end".
{
"_id" : ObjectId("54ccd186b8ea19c89ee8f231"),
"deleted" : "0",
"packet" : {
"datetime" : ISODate("2015-01-31T12:58:11.939Z"),
"signallingType" : "M2PA",
"opc" : "326",
"dpc" : "6406",
"transState" : "begin",
"otid" : "M2PA0400435B",
"dtid" : "",
"sccpCalling" : "523332075100",
"sccpCalled" : "523331466304",
"operation" : "mo-forwardSM (46)",
...
}
}
/* 1 */
{
"_id" : ObjectId("54ccd1a1b8ea19c89ee8f7c5"),
"deleted" : "0",
"packet" : {
"datetime" : ISODate("2015-01-31T12:58:16.788Z"),
"signallingType" : "M2PA",
"opc" : "6407",
"dpc" : "326",
"transState" : "end",
"otid" : "",
"dtid" : "M2PA0400435B",
"sccpCalling" : "523331466304",
"sccpCalled" : "523332075100",
"operation" : "Not Found",
...
}
}
Because of the network architecture, we're tracing in two (2) points, and the traffic is balanced amongst these two points. This means sometimes we see "continue"s or "end"s BEFORE a "begin". Conversely, we might see a "continue" BEFORE a "begin" or "end". In short, transactions are not ordered.
Moreover, multiple end-points are "talking" amongst themselves, and transactionIDs might get duplicated, 2 endpoints could be using the same tid and other 2 endpoints at the same time, though this doesn't happen all the time, it does happen.
Because of the later, I also need to use the SCCP layer's "calling" and "called" Global titles (like phone numbers).
Bear in mind that I don't know which way a given packet is going, so this is what I'm doing:
Whenever I get a new packet I must find whether the transaction already exists in mongodb, I'm using upsert to do this.
I do this by searching the current's packet otid or dtid in either otid or dtid of existing packets
If it does: push the new packet into the existing document.
If it doesn't: create a new document with the packet.
As an example, this is a upsert for an "end" which should find a "begin":
db.runCommand(
{
update: "packets",
updates:
[
{ q:
{ $and:
[
{
$or: [
{ "packet.otid":
{ $in: [ "M2PA042e3918" ] }
},
{ "packet.dtid":
{ $in: [ "M2PA042e3918" ] }
}
]
},
{
$or: [
{ "packet.sccpCalling":
{ $in: [ "523332075151", "523331466305" ] }
},
{ "packet.sccpCalled":
{ $in: [ "523332075151", "523331466305" ] }
}
]
}
]
},
{
$setOnInsert: {
"unique-id": "422984b6-6688-4782-9ba1-852a9fc6db3b", deleted: "0"
},
$push: {
packet: {
datetime: new Date(1422371239182),
opc: "327", dpc: "6407",
transState: "end",
otid: "", dtid: "M2PA042e3918", sccpCalling: "523332075151", ... }
}
},
upsert: true
}
],
writeConcern: { j: "1" }
}
)
Now, all of this works, until I put it in production.
It seems packets are coming way to fast and I see lots of:
"ClientCursor::staticYield Can't Unlock B/c Of Recursive Lock" Warnings
I read that we can ignore this warning, but I've found that my upserts DO NOT update the documents! It looks like there's a lock and mongodb forgets about the update. If I change the upsert to a simple insert, no packets are lost
I also read this is related to no indexes being used, I have the following index:
"3" : {
"v" : 1,
"key" : {
"packet.otid" : 1,
"packet.dtid" : 1,
"packet.sccpCalling" : 1,
"packet.sccpCalled" : 1
},
"name" : "packet.otid_1_packet.dtid_1_packet.sccpCalling_1_packet.sccpCalled_1",
"ns" : "tracer.packets"
So in conclusion:
1.- If this index is not correct, can someone please help me creating the correct index?
2.- Is it normal that mongo would NOT update a document if it finds a lock?
Thanks and regards!
David
Why are you storing all of the packets in an array? Normally in this kind of situation it's better to make each packet a document on its own; it's hard to say more without more information about your use case (or, perhaps, more knowledge of all these acronyms you're using :D). Your updates would become inserts and you would not need to do the update query. Instead, some other metadata on a packet would join related packets together so you could reconstruct a transaction or whatever you need to do.
More directly addressing your question, I would use an array field tids to store [otid, dtid] and an array field sccps to store [sccpCalling, sccpCalled], which would make your update query look like
{ "tids" : { "$in" : ["M2PA042e3918"] }, "sccps" : { "$in" : [ "523332075151", "523331466305" ] } }
and amenable to the index { "tids" : 1, "sccps" : 1 }.

How to redirect output from "dbgs" to a special filename?

I have written an llvm Pass to assign every basic block an stable ID, and output their name and parent name for next analysis. I use dbgs() to implement that. When I use -opt to run this pass, my command line as follow:
$ opt -load ../../build/Release+Debug+Asserts/lib/libdgutility.so \
-mergereturn -bbnum -dump-bbid -remove-bbnum -stats -debug \
unit1300.bc -o /dev/null
it can output information to terminal, here is a part of my output stream:
FuncName : setcharset 16041 : if.end189
FuncName : setcharset 16042 : if.end190
FuncName : setcharset 16043 : if.end191
FuncName : setcharset 16044 : if.then196
FuncName : setcharset 16045 : land.lhs.true
FuncName : setcharset 16046 : lor.lhs.false209
.
.
.
FuncName : setcharset 16053 : while.end
FuncName : setcharset 16054 : if.else248
but I want to store this information in a file named *.log. I have tried redirection, but it does not work.

How to work with a JSON string returned by a remote URL (with Django)?

i have to build an small app in order to show some data from the Google Financial API.
I know that i could study it inside out, but I don't have much time.
The url http://www.google.com/finance/info?q=MSFT returns this JSON string:
// [ { "id": "358464" ,"t" : "MSFT" ,"e" : "NASDAQ" ,"l" : "24.38" ,"l_cur" : "24.38" ,"ltt":"4:00PM EDT" ,"lt" : "Oct 1, 4:00PM EDT" ,"c" : "-0.11" ,"cp" : "-0.45" ,"ccol" : "chr" ,"el": "24.39" ,"el_cur": "24.39" ,"elt" : "Oct 1, 7:58PM EDT" ,"ec" : "+0.01" ,"ecp" : "0.04" ,"eccol" : "chg" ,"div" : "0.16" ,"yld" : "2.63" } ]
I don't know how to make that string available to a view. I need to "catch it" and show (some of) it in my template. I need something like:
def myview(...)
URL = 'http://www.google.com/finance/info?q=MSFT'
mystring = catchfromURL(URL)
#work with the string
return render_to_response('page.html', mystring)
Thanks in advance.
That little // at the beginning threw me off too. Here's what you do:
import json
jsonData = json.loads(mystring[3:])
Now, I don't know what any of the encoded data there means, but that's how you can get it as python objects.