MongoDB regular expression not working as expected

MongoDB regular expression not working as expected - regex

db.col.find({_id: {term: "garcia"}})
finds the document with term = "garcia". However,
db.col.find({_id: {term: /garcia/}})
doesn't find anything. What's the reason?
Document:
{ "_id" : { "term" : "garcia" }, "count" : 43512, "count_users" : 15388 }

Your current query using {_id: {term: /garcia/}} is asking for an exact match on _id itself, not just the term field within it. So it's trying to find a doc where _id is an object with a single term field with a value of that regular expression.
Use dot notation to match the regular expression against just the term field:
db.col.find({'_id.term': /garcia/})

Related

How can we have a ProjectionExpression on a key that is an integer

I have a scenario where The data is structured such as this.
{
"PartitionKey": "foobar",
"SomeDict": {
"10": "a value",
"20": "another value",
...
}
}
I want to have a projection expression to only read one of the values. The "naive" way would be to do the following query:
get_item(
Key={"PartitionKey": "foobar"},
ProjectionExpression="SomeDict.10"
)
But it fails with the following error: An error occurred (ValidationException) when calling the GetItem operation: Invalid ProjectionExpression: Syntax error; token: "10", near: ".10"
Is there a way to have projection expressions on keys that are integers, or is that a limitation?
Thanks!

From the docs:
You can use any attribute name in a projection expression, provided
that the first character is a-z or A-Z and the second character (if
present) is a-z, A-Z, or 0-9. If an attribute name does not meet this
requirement, you must define an expression attribute name as a
placeholder.
You need to provide ExpressionAttributeNames. To add to your example it would be something like:
get_item(
Key={"PartitionKey": "foobar"},
ProjectionExpression="#somedict.#ten",
ExpressionAttributeNames={"#somedict":"SomeDict", "#ten":"10"}
)
Read more here from the Amazon DynamoDB docs

How to extract a value from a string column using hive

I need to extract a field from a string column using hive
Input: [{"name":"MANAGER"}]
Output: MANAGER
I was able to fetch the record using the below regular expression, but I am not able to remove ] from the output.
Query built:
select split(regexp_replace('([{"name":"MANAGER"}])','^\\(|\\)$|[{"}]',''),': *')[1];
Output obtained:
MANAGER]
Could you please help me to remove the ] from the output and get only MANAGER in this example using hive.

You can atually parse this with get_json_object function as the string you shared is a JSON string:
select get_json_object(regexp_replace('[{"name":"MANAGER"}]', '[\\[\\]]', ''), '$.name')
See the documentation:
get_json_object
A limited version of JSONPath is supported:
$ : Root object
. : Child operator
[] : Subscript operator for array
* : Wildcard for []
Syntax not supported that's worth noticing:
: Zero length string as key
.. : Recursive descent
# : Current object/element
() : Script expression
?() : Filter (script) expression.
[,] : Union operator
[start:end.step] : array slice operator

Fluentd Parsing

Hi i'm trying to parse single line log using fluentd. Here is log i'm trying to parse.
F2:4200000000000000,F3:000000,F4:000000060000,F6:000000000000,F7:000000000,F8..........etc
This will parse into like this:
{ "F2" : "4200000000000000", "F3" : "000000", "F4" : "000000060000" ............etc }
I tried to use regex but it's confusing and making me write multiple regexes for different keys and values. Is there any easier way to achieve this ?
EDIT1: Heya! I will make this more detailed. I'm currently tailing logs using fluentd to Elasticsearch+Kibana. Here is unparsed example log that fluentd sending to Elasticsearch:
21/09/02 16:36:09.927238: 1 frSMS:0:13995:#HTF4J::141:141:msg0210,00000000000000000,000000,000000,007232,00,#,F2:00000000000000000,F3:002000,F4:000000820000,F6:Random message and strings,F7:.......etc
Elasticsearch recived message:
{"message":"frSMS:0:13995:#HTF4J::141:141:msg0210,00000000000000000,000000,000000,007232,00,#,F2:00000000000000000,F3:002000,F4:000000820000,F6:Random
digits and chars,F7:.......etc"}
This log has only message key so i can't index and create dashboard on only using whole message field. What am i trying to achieve is catch only useful fields, add key into it if it has no key and make indexing easier.
Expected output:
{"logdate" : "21/09/02 16:36:09.927238",
"source" : "frSMS",
"UID" : "#HTF4J",
"statuscode" : "msg0210",
"F2": "00000000000000000",
"F3": "randomchar314516",.....}
I used regex plugin to parse into this but it was too overwhelming and . Here is what i did so far:
^(?<logDate>\d{2}.\d{2}.\d{2}\s\d{2}:\d{2}:\d{2}.\d{6}\b)....(?<source>fr[A-Z]{3,4}|to[A-Z]{3,4}\b).(?<status>\d\b).(?<dummyfield>\d{5}\b).(?<HUID>.[A-Z]{5}\b)..(?<d1>\d{3}\b).(?<d2>\d{3}\b).(?<msgcode>msg\d{4}\b).(?<dummyfield1>\d{16}\b).(?<dummyfield2>\d{6}\b).(?<dummyfield3>\d{6,7}\b).(?<dummyfield4>\d{6}\b).(?<dummyfield5>\d{2}\b)...
Which results to :
"logDate": "21/09/02 16:36:09.205706",
"source": "toSMS" ,
"status": "0",
"dummyfield": "13995" ,
"UID" : "#HTFAA" ,
"d1" : "156" ,
"d2" : "156" ,
"msgcode" : "msg0210",
"dummyfield1" :"0000000000000000" ,
"dummyfield2" :"002000",
"dummyfield3" :"2000000",
"dummyfield4" :"00",
"dummyfield5" :"2000000" ,
"dummyfield6" :"867202"
Which only applies to example log and has useless fields like field1, dummyfield, dummyfield1 etc.
Other logs has the useful values and keys(date,source,msgcode,UID,F1,F2 fields) like i showcased on expected output. Not useful fields are not static(they can be none, or has less|more digits and chars) so they trigger the pattern not matched error.
So the question is:
How do i capture useful fields that i mentioned using regex?
How do i capture F1,F2,F3...... fields that has different value
patterns like char string mixed?
PS: I wraped the regex i wrote into html snippet so the <> capturing fields don't get deleted

Regex pattern to use:
(F[\d]+):([\d]+)
This pattern will catch all the 'F' values with whatever digit that comes after - yes even if it's F105 it still works. This whole 'F105' will be stored as the first group in your regex match expression
The right part of the above pattern will catch the value of all the digits following ':' up until any charachter that is not a digit. i.e. ',', 'F', etc.. and will store it as the second group in your regex match
Use
Depending on your coding language you will have to access your regex matches variable with an iterator and extract group 1 and group 2 respectivly
Python example:
import re
log = 'F2:4200000000000000,F3:000000,F4:000000060000,F6:000000000000,F7:000000000,F105:9726450'
pattern = '(F[\d]+):([\d]+)'
matches = re.finditer(pattern,log)
log_dict = {}
for match in matches:
log_dict[match.group(1)] = match.group(2)
print(log_dict)
Output
{'F2': '4200000000000000', 'F3': '000000', 'F4': '000000060000', 'F6': '000000000000', 'F7': '000000000', 'F105': '9726450'}

Assuming the logdate will be static(in pattern wise) You can ignore useless values using ".+" regex and get collect the useful values by their patterns. So the regex will be like this :
(?\d{2}.\d{2}.\d{2}\s\d{2}:\d{2}:\d{2}.\d{6}\b).+(?fr[A-Z]{3,4}|to[A-Z]{3,4}).+(?#[A-Z0-9]{5}).+(?msg\d{4})
And output will be like:
{"logdate" : "21/09/02 16:36:09.927238", "source" : "frSMS",
"UID" : "#HTF4J","statuscode" : "msg0210"}
And I'm working on getting F2,F3,FN keys and values.

MongoDB query with special characters in key

In my case, I have keys in my MongoDB database that contain a dot in their name (see attached screenshot). I have read that it is possible to store data in MongoDB this way, but the driver prevents queries with dots in the key. Anyway, in my MongoDB database, keys do contain dots and I have to work with them.
I have now tried to encode the dots in the query (. to \u002e) but it did not seem to work. Then I had the idea to work with regex to replace the dots in the query with any character but regex seems to only work for the value and not for the key.
Does anyone have a creative idea how I can get around this problem? For example, I want to have all the CVE numbers for 'cve_results.BusyBox 1.12.1'.
Update #1:
The structure of cve_results is as follows:
"cve_results" : {
"BusyBox 1.12.1" : {
"CVE-2018-1000500" : {
"score2" : "6.8",
"score3" : "8.1",
"cpe_version" : "N/A"
},
"CVE-2018-1000517" : {
"score2" : "7.5",
"score3" : "9.8",
"cpe_version" : "N/A"
}
}}

With the following workaround I was able to directly access documents by their keys, even though they have a dot in their key:
db.getCollection('mycollection').aggregate([
{$match: {mymapfield: {$type: "object" }}}, //filter objects with right field type
{$project: {mymapfield: { $objectToArray: "$mymapfield" }}}, //"unwind" map to array of {k: key, v: value} objects
{$match: {mymapfield: {k: "my.key.with.dot", v: "myvalue"}}} //query
])

If possible, it could be worth inserting documents using \u002e instead of the dot, that way you can query them while retaining the ASCII values of the . for any client rendering.
However, It appears there's a work around to query them like so:
db.collection.aggregate({
$match: {
"BusyBox 1.12.1" : "<value>"
}
})

You should be able to use $eq operator to query fields with dots in names.

Why this error "Values must match the following regular expression: 'ga:.+'" displayed?

I'm using Google Analytics Api with Apps Script and trying to filter some pages using regular expressions.
But always this error displayed.
Invalid value '{filters=ga:pagePath=~(/burm)|(/4assort)|(/mkn)|(/apl)|(/grp)|(/pea)|(/arakawa)}'. Values must match the following regular expression: 'ga:.+'
I tried some simple expression, for example,
var ReEx ='(\/abc)|(\/def)';
'filters':'ga:pagePath=~'+ ReEx
Is there something incorrect expressions in my code?
Therefore I tried to filter a single page. and it still same error returned.
function getCart(){
return '/ShoppingCart.html';
}
var Cart = {
'filters':'ga:pagePath=~'+getCart()
}
var results = Analytics.Data.Ga.get(//
tableId, // Table id (format ga:xxxxxx).
startDate, // Start-date (format yyyy-MM-dd).
endDate, // End-date (format yyyy-MM-dd).
Cart
);
sheet.getRange(3, 7).setValues(results.getRows());

I solved it by naming pagepath in small caps also make sure there are no spaces:
ga:pagepath==/prodcuts/motors
or
ga:pagepath=~/prodcuts/motors

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

MongoDB regular expression not working as expected - regex

db.col.find({_id: {term: "garcia"}}) finds the document with term = "garcia". However, db.col.find({_id: {term: /garcia/}}) doesn't find anything. What's the reason? Document: { "_id" : { "term" : "garcia" }, "count" : 43512, "count_users" : 15388 }

Related

How can we have a ProjectionExpression on a key that is an integer

How to extract a value from a string column using hive

Fluentd Parsing

MongoDB query with special characters in key

Why this error "Values must match the following regular expression: 'ga:.+'" displayed?

Categories

Resources