I have a DSL that I'm trying to write a textmate grammar for, so I can have syntax highlighting (in VS Code rather than textmate, but it uses their grammar)
The DSL represent tree nodes, is whitespace indented, and gets mapped to JSON
Most of it is pretty straightforward, but it has some weird constructs that I'm not sure how to match with the textmate grammar. An example of the DSL is below.
I can match the "tags" and simple scalar type "properties" and the single line "arrays" just fine, but I'm a bit lost on the multiline arrays and objects - I don't know how to stop matching their contents when the indentation changes. I thought I could just stop matching when I hit a known pattern, but its ambiguous between a single item on the last line of the array and a "tag" apart from their indentation level
Is there some way to match these based on indentation alone?
Example DSL file:
fooTag
foo An unquoted string
bar 42
baz "42"
qux true
barTag
qux{}
foo A string called foo on the qux object
bar null
foo[] in arrays each space separated token is a separate string "unless quoted"
bar[]
some arrays of space separated tokens look like this when it is a long
list with lots of items
baz[]
arrays can be mixed types like this one which contains the number 42 and
the constants true false and null and the rest of the tokens are all
strings and even nested arrays and objects as follows
[] 1 2 3
[]
indented array of tokens
{}
foo 42
bar "42"
some
more
tags
Example JSON output:
[
{
"name": "fooTag",
"model": {
"foo": "An unquoted string",
"bar": 42,
"baz": "42",
"qux": true
}
},
[
{
"name": "barTag",
"model": {
"qux": {
"foo": "A string called foo on the qux object",
"bar": null
},
"foo": [
"each", "space", "separated", "token", "is", "a", "separate",
"string", "unless quoted"
],
"bar": [
"some", "arrays", "of", "space", "separated", "tokens", "look",
"like", "this", "when", "it", "is", "a", "long", "list", "with",
"lots", "of", "items"
],
"baz": [
"arrays", "can", "be", "mixed", "types", "like", "this", "one",
"which", "contains", "the", "number", 42 "and", "the", "constants",
true, false, "and", null, "and", "the", "rest", "of", "the", "tokens",
"are", "all", "strings", "and", "even", "nested", "arrays", "and",
"objects", "as", "follows",
[ 1, 2, 3 ],
[ "indented", "array", "of", "tokens" ],
{
"foo": 42,
"bar": "42"
}
]
}
},
[
{
"name": "some",
"model": {}
}
],
[
{
"name": "more",
"model": {}
}
],
[
{
"name": "tags",
"model": {}
}
]
]
]
Related
Given the JSON structure below i would like to find the first occurrence of object ccc so I can add a new object to the children ddd. However I do not know the key name of the parent or how many levels deep it may be.
to find
"children": {
"ccc": [{
"id": "ddd",
"des": "object d",
"parent": "ccc"
}]
}
full JSON stored in $myJson
{
"zzz": [{
"id": "aaa",
"des": "object A",
"parent": "zzz",
"children": {
"aaa": [{
"id": "bbb",
"des": "object B",
"parent": "aaa",
"children": {
"bbb": [{
"id": "ccc",
"des": "object C",
"parent": "bbb",
"children": {
"ccc": [{
"id": "ddd",
"des": "object d",
"parent": "ccc"
}]
}
}, {
"id": "eee",
"des": "object e",
"parent": "bbb"
}]
}
},{
"id": "fff",
"des": "object f",
"parent": "aaa"
}]
}
}]}
follow some other answers I have tried combinations of
output=($(jq -r '.. | with_entries(select(.key|match("ccc";"i")))' <<< ${myjson}))
or
output=($(jq -r '.. | to_entries | map(select(.key | match("ccc";"i"))) | map(.value)' <<< ${myjson}))
all give errors of a similar nature jq: error (at <stdin>:1): number (0) cannot be matched, as it is not a string
In the following, I'll assume you want to add "ADDITIONAL" to the array at EVERY key that matches a given regex (here "ccc"):
walk(if type == "object"
then with_entries(if (.key|test("ccc"))
then .value += ["ADDITIONAL"] else . end)
else . end)
If your jq does not have walk/1, then you can simply copy-and-paste its def from the jq FAQ or builtin.jq
Alternative formulation
If you have the following general-purpose helper function handy (e.g. in your ~/.jq):
def when(filter; action): if (filter?) // null then action else . end;
then the above solution shrinks down to:
walk(when(type == "object";
with_entries(when(.key|test("ccc"); .value += ["ADDITIONAL"]))))
Mongodb allows regex expression of pattern /pattern/ without using $regex expression.
http://docs.mongodb.org/manual/reference/operator/query/in/
How can i do it using morphia ?
If i give Field criteria with field operator as in and value of type "java.util.regex.Pattern" then the equivalent query generated in
$in:[$regex: 'given pattern'] which wont return expected results at all.
Expectation: $in :[ /pattern1 here/,/pattern2 here/]
Actual using 'Pattern' object : $in : [$regex:/pattern1 here/,$regex:/pattern 2 here/]
I'm not entirely sure what to make of your code examples, but here's a working Morphia code snippet:
Pattern regexp = Pattern.compile("^" + email + "$", Pattern.CASE_INSENSITIVE);
mongoDatastore.find(EmployeeEntity.class).filter("email", regexp).get();
Note that this is really slow. It can't use an index and will always require a full collection scan, so avoid it at all cost!
Update: I've added a specific code example. The $in is not required to search inside an array. Simply use /^I/ as you would in string:
> db.profile.find()
{
"_id": ObjectId("54f3ac3fa63f282f56de64bd"),
"tags": [
"India",
"Australia",
"Indonesia"
]
}
{
"_id": ObjectId("54f3ac4da63f282f56de64be"),
"tags": [
"Island",
"Antigua"
]
}
{
"_id": ObjectId("54f3ac5ca63f282f56de64bf"),
"tags": [
"Spain",
"Mexico"
]
}
{
"_id": ObjectId("54f3ac6da63f282f56de64c0"),
"tags": [
"Israel"
]
}
{
"_id": ObjectId("54f3ad17a63f282f56de64c1"),
"tags": [
"Germany",
"Indonesia"
]
}
{
"_id": ObjectId("54f3ad56a63f282f56de64c2"),
"tags": [
"ireland"
]
}
> db.profile.find({ tags: /^I/ })
{
"_id": ObjectId("54f3ac3fa63f282f56de64bd"),
"tags": [
"India",
"Australia",
"Indonesia"
]
}
{
"_id": ObjectId("54f3ac4da63f282f56de64be"),
"tags": [
"Island",
"Antigua"
]
}
{
"_id": ObjectId("54f3ac6da63f282f56de64c0"),
"tags": [
"Israel"
]
}
{
"_id": ObjectId("54f3ad17a63f282f56de64c1"),
"tags": [
"Germany",
"Indonesia"
]
}
Note: The position in the array makes no difference, but the search is case sensitive. Use /^I/i if this is not desired or Pattern.CASE_INSENSITIVE in Java.
Single RegEx Filter
use .filter(), .criteria(), or .field()
query.filter("email", Pattern.compile("reg.*exp"));
// or
query.criteria("email").contains("reg.*exp");
// or
query.field("email").contains("reg.*exp");
Morphia converts this into:
find({"email": { $regex: "reg.*exp" } })
Multiple RegEx Filters
query.or(
query.criteria("email").contains("reg.*exp"),
query.criteria("email").contains("reg.*exp.*2"),
query.criteria("email").contains("reg.*exp.*3")
);
Morphia converts this into:
find({"$or" : [
{"email": {"$regex": "reg.*exp"}},
{"email": {"$regex": "reg.*exp.*2"}},
{"email": {"$regex": "reg.*exp.*3"}}
]
})
Unfortunately,
You cannot use $regex operator expressions inside an $in.
MongoDB Manual 3.4
Otherwise, we could do:
Pattern[] patterns = new Pattern[] {
Pattern.compile("reg.*exp"),
Pattern.compile("reg.*exp.*2"),
Pattern.compile("reg.*exp.*3"),
};
query.field().in(patterns);
hopefully, one day morphia will support that :)
I have follwoing collection structure -
{
"_id": ObjectId("54c784d71e14acf9ae833f9f"),
"vms": [
{
"name": "ABC",
"ids": [
"abc.60a980004270457730244662385a4f69",
"abc.60a980004270457730244662385a4f6d"
]
},
{
"name": "PQR",
"ids": [
"abc.6d867d9c7acd60001aed76eb2c70bd53",
"abc.60a980004270457730244662385a4f6d"
]
},
{
"name": "XYZ",
"ids": [
"abc.600605b00237d91016cdc38f376bd31d",
"abc.600605b00237d91016cdc38f376cd32f"
]
}
]
}
I have an array which contains substrings of ids. here is an array for your reference -
myArray = [ "4270457730244662385a4f69","4270457730244662385a4f6d" , "4270457730244662385a4f6b"]
I want to find each element of myArray is not present in ids as a substring using mongo.
Currently I am able to find single element using regex in mongo.
In above example, I want output as:
[
{
"name": "XYZ",
"ids": [
"abc.600605b00237d91016cdc38f376bd31d",
"abc.600605b00237d91016cdc38f376cd32f"
]
}
]
How do I find substring in array using mongo??
It is possible to do it using regex. You can match the string for multiple substrings using or operator. It is | in regex. Search for 'Boolean "or"' on wikipedia
MongoDB query using aggregation:
db.collection_name.aggregate([
{$unwind: "$vms"},
{$match: {
"vms.ids": {$not: /.*(4270457730244662385a4f69|4270457730244662385a4f6d|4270457730244662385a4f6b).*/}}
}
])
Output will be
{
"_id" : ObjectId("54c784d71e14acf9ae833f9f"),
"vms" : {
"name" : "XYZ",
"ids" : [
"abc.600605b00237d91016cdc38f376bd31d",
"abc.600605b00237d91016cdc38f376cd32f"
]
}
}
My collection contains the following two documents
{
"BornYear": 2000,
"Type": "Zebra",
"Owners": [
{
"Name": "James Bond",
"Phone": "007"
}
]
}
{
"BornYear": 2012,
"Type": "Dog",
"Owners": [
{
"Name": "James Brown",
"Phone": "123"
},
{
"Name": "Sarah Frater",
"Phone": "345"
}
]
}
I would like to find all the animals whichs have an owner called something with James.
I try to unwind the Owners array, but cannot get access to the Name variable.
Bit of a misnomer here. To just find the "objects" or items in a "collection" then all you really need to do is match the "object/item"
db.collection.find({
"Owners.Name": /^James/
})
Which works, but does not of course limit the results to the "first" match of "James", which would be:
db.collection.find(
{ "Owners.Name": /^James/ },
{ "Owners.$": 1 }
)
As a basic projection. But that does not give any more than a "single" match, which means you need the .aggregate() method instead like so:
db.collection.aggregate([
// Match the document
{ "$match": {
"Owners.Name": /^James/
}},
// Flatten or de-normalize the array
{ "$unwind": "Owners" },
// Filter th content
{ "$match": {
"Owners.Name": /^James/
}},
// Maybe group it back
{ "$group": {
"_id": "$_id",
"BornYear": { "$first": "$BornYear" },
"Type": { "$first": "$Type" },
"Ownners": { "$push": "$Owners" }
}}
])
And that allows more than one match in a sub-document array while filtering.
The other point is the "anchor" or "^" caret on the regular expression. You really need it where you can, to make matches at the "start" of the string where an index can be properly used. Open ended regex operations cannot use an index.
You can use dot notation to match against the fields of array elements:
db.test.find({'Owners.Name': /James/})
Basically i'm trying to implement tags functionality on a model.
> db.event.distinct("tags")
[ "bar", "foo", "foobar" ]
Doing a simple distinct query retrieves me all distinct tags. However how would i go about getting all distinct tags that match a certain query? Say for example i wanted to get all tags matching foo and then expecting to get ["foo","foobar"] as a result?
The following queries is my failed attempts of achieving this:
> db.event.distinct("tags",/foo/)
[ "bar", "foo", "foobar" ]
> db.event.distinct("tags",{tags: {$regex: 'foo'}})
[ "bar", "foo", "foobar" ]
The aggregation framework and not the .distinct() command:
db.event.aggregate([
// De-normalize the array content to separate documents
{ "$unwind": "$tags" },
// Filter the de-normalized content to remove non-matches
{ "$match": { "tags": /foo/ } },
// Group the "like" terms as the "key"
{ "$group": {
"_id": "$tags"
}}
])
You are probably better of using an "anchor" to the beginning of the regex is you mean from the "start" of the string. And also doing this $match before you process $unwind as well:
db.event.aggregate([
// Match the possible documents. Always the best approach
{ "$match": { "tags": /^foo/ } },
// De-normalize the array content to separate documents
{ "$unwind": "$tags" },
// Now "filter" the content to actual matches
{ "$match": { "tags": /^foo/ } },
// Group the "like" terms as the "key"
{ "$group": {
"_id": "$tags"
}}
])
That makes sure you are not processing $unwind on every document in the collection and only those that possibly contain your "matched tags" value before you "filter" to make sure.
The really "complex" way to somewhat mitigate large arrays with possible matches takes a bit more work, and MongoDB 2.6 or greater:
db.event.aggregate([
{ "$match": { "tags": /^foo/ } },
{ "$project": {
"tags": { "$setDifference": [
{ "$map": {
"input": "$tags",
"as": "el",
"in": { "$cond": [
{ "$eq": [
{ "$substr": [ "$$el", 0, 3 ] },
"foo"
]},
"$$el",
false
]}
}},
[false]
]}
}},
{ "$unwind": "$tags" },
{ "$group": { "_id": "$tags" }}
])
So $map is a nice "in-line" processor of arrays but it can only go so far. The $setDifference operator negates the false matches, but ultimately you still need to process $unwind to do the remaining $group stage for distinct values overall.
The advantage here is that arrays are now "reduced" to only the "tags" element that matches. Just don't use this when you want a "count" of the occurrences when there are "multiple distinct" values in the same document. But again, there are other ways to handle that.