MongoDB - Find numbers that starts with a string - regex

I'm trying to make a query that gets all the prices that starts with '12'.
I have a collection like this:
{
"place": "Costa Rica",
"name": "Villa Lapas",
"price": 1353,
},
{
"place": "Costa Rica",
"name": "Hotel NWS",
"price": 1948,
},
{
"place": "Costa Rica",
"name": "Hotel Papaya",
"price": 1283,
},
{
"place": "Costa Rica",
"name": "Hostal Serine",
"price": 1248,
},
And I want my results like this:
{
'prices': [
1248,
1283
]
}
I'm converting all the prices to string in order to use a regex function. But I don't understand very well how to use the regex in my query.
My query returns:
{ "prices" : null }
{ "prices" : null }
Could someone please guide me? :)
db.collection.aggregate([
{'$project': {
'_id': 0,
'price': {'$toString': '$price'}
}},
{'$project': {
'prices': {'$regexFind': { 'input': "$price", 'regex': '^12' }}
}}
]).pretty();

You are almost correct.
db.test.aggregate([
{'$project': {
'_id': 0,
'prices': {'$toString': '$price'}
^^^ -> I meant this
}},
{'$match': {
'prices': {'$regex': '^12' }
^^^ -> same here
}}
])
You need to use $match with $regex which yields the result as you expected.
If you use regexFind, it works on all matching docs and returns null where input doesn't match the pattern
And
In the first project you have price instead prices. If you refer the first project name in the second project, then pipeline matches.

Related

How to find middle of the string, next to space and dot in MongoDB

[
{
"Name": "Dr.Soma",
"Email": "drsoma#gmail.com",
"MobNo": 111111111
},
{
"Name": "Bootha Ganesh",
"Email": "boothaganesg#gmail.com",
"MobNo": 222222222
},
{
"Name": "Steven",
"Email": "steven#gmail.com",
"MobNo": 333333333
},
{
"Name": "Dr.Anbarasi",
"Email": "anbarasi#gmail.com",
"MobNo": 4444444444
}
]
I try this to using find regex
db.details.find({Name:{$regex:/steven/i}})
output:
{
"Name": "Steven",
"Email": "steven#gmail.com",
"MobNo": 333333333
}
How to find data Name dot(.) after Soma & Space after Ganesh
Excepted Output
If I find Name Ganesh,I need
{
"Name": "Bootha Ganesh",
"Email": "boothaganesg#gmail.com",
"MobNo": 222222222
}
If I find Name small s or capital S ,I need
{
"Name": "Dr.Soma",
"Email": "drsoma#gmail.com",
"MobNo": 111111111
}
No Need Name Dr.Anbarasi data
db.collection.find({"Name": {'$regex': /\bsoma[a-zA-Z0-9]*/gi}})
\b assert position at a word boundary: (^\w | \w$ | \W\w|\w\W)
soma-for searching value
[a-zA-Z0-9]-word characters
*-to entire string
db.collection.find({ Name: { $regex: "Soma" } })
mongoplayground
db.collection.find({ Name: { $regex: ".Soma" } })
mongoplayground
db.collection.find({ Name: { $regex: " Ganesh" } })
mongoplayground

How do I extract data from "List" field

I'm getting JSON data from webservice and trying to make a table . Datadisk is presented as List and clicking into each item will navigate further down the hiearchy like denoted in screenshots below. I need to concatate storageAccountType for each item with | sign, so if there were 2 list items for Greg-VM and it had Standard_LRS for first one and Premium_LRS for second one then new column will list Standard_LRS | Premium_LRS for that row.
Input returned by function is below
[
{
"name": "rhazuremspdemo",
"disk": {
"id": "/subscriptions/24ba3e4c-45e3-4d55-8132-6731cf25547f/resourceGroups/AzureMSPDemo/providers/Microsoft.Compute/disks/rhazuremspdemo_OsDisk_1_346353b875794dd4a7a5c5938abfb7df",
"storageAccountType": "StandardSSD_LRS"
},
"datadisk": []
},
{
"name": "w12azuremspdemo",
"disk": {
"id": "/subscriptions/24ba3e4c-45e3-4d55-8132-6731cf25547f/resourceGroups/AzureMSPDemo/providers/Microsoft.Compute/disks/w12azuremspdemo_OsDisk_1_09788205f8eb429faa082866ffee0f18",
"storageAccountType": "Premium_LRS"
},
"datadisk": []
},
{
"name": "Greg-VM",
"disk": {
"id": "/subscriptions/24ba3e4c-45e3-4d55-8132-6731cf25547f/resourceGroups/GREG/providers/Microsoft.Compute/disks/Greg-VM_OsDisk_1_63ed471fef3e4f568314dfa56ebac5d2",
"storageAccountType": "Premium_LRS"
},
"datadisk": [
{
"name": "Data",
"createOption": "Attach",
"diskSizeGB": 10,
"managedDisk": {
"id": "/subscriptions/24ba3e4c-45e3-4d55-8132-6731cf25547f/resourceGroups/GREG/providers/Microsoft.Compute/disks/Data",
"storageAccountType": "Standard_LRS"
},
"caching": "None",
"toBeDetached": false,
"lun": 0
},
{
"name": "Disk2",
"createOption": "Attach",
"diskSizeGB": 10,
"managedDisk": {
"id": "/subscriptions/24ba3e4c-45e3-4d55-8132-6731cf25547f/resourceGroups/GREG/providers/Microsoft.Compute/disks/Disk2",
"storageAccountType": "Standard_LRS"
},
"caching": "None",
"toBeDetached": false,
"lun": 1
}
]
}
]
How do I do that?
Thanks,
G
This should help you. It steps through the process.
If you have a scenario like this
you can use Add custom Column and type the follwing expression:
=Table.Columns([TableName], "ColumnName")
to get it as list:
Now you can left click on the Custom column and chose Extract Values....
Choose Custom and your delimiter | and hit OK
This way the data who was in your list will now be in the same row with the delimiter

How to interpret user search query (in Elasticsearch)

I would like to serve my visitors the best results possible when they use our search feature.
To achieve this I would like to interpret the search query.
For example a user searches for 'red beds for kids 120cm'
I would like to interpret it as following:
Category-Filter is "beds" AND "children"
Color-filter is red
Size-filter is 120cm
Are there ready to go tools for Elasticsearch?
Will I need NLP in front of Elasticsearch?
Elasticsearch is pretty powerful on its own and is very much capable of returning the most relevant results to full-text search queries, provided that data is indexed and queried adequately.
Under the hood it always performs text analysis for full-text searches (for fields of type text). A text analyzer consists of a character filter, tokenizer and a token filter.
For instance, synonym token filter can replace kids with children in the user query.
Above that search queries on modern websites are often facilitated via category selectors in the UI, which can easily be implemented with querying keyword fields of Elasticsearch.
It might be enough to model your data correctly and tune its indexing to implement the search you need - and if that is not enough, you can always add some extra layer of NLP-like logic on the client side, like #2ps suggested.
Now let me show a toy example of what you can achieve with a synonym token filter and copy_to feature.
Let's define the mapping
Let's pretend that our products are characterized by the following properties: Category, Color, and Size.LengthCM.
The mapping will look something like:
PUT /my_index
{
"mappings": {
"properties": {
"Category": {
"type": "keyword",
"copy_to": "DescriptionAuto"
},
"Color": {
"type": "keyword",
"copy_to": "DescriptionAuto"
},
"Size": {
"properties": {
"LengthCM": {
"type": "integer",
"copy_to": "DescriptionAuto"
}
}
},
"DescriptionAuto": {
"type": "text",
"analyzer": "MySynonymAnalyzer"
}
}
},
"settings": {
"index": {
"analysis": {
"analyzer": {
"MySynonymAnalyzer": {
"tokenizer": "standard",
"filter": [
"MySynonymFilter"
]
}
},
"filter": {
"MySynonymFilter": {
"type": "synonym",
"lenient": true,
"synonyms": [
"kid, kids => children"
]
}
}
}
}
}
}
Notice that we selected type keyword for the fields Category and Color.
Now, what about these copy_to and synonym?
What will copy_to do?
Every time we send an object for indexing into our index, value of the keyword field Category will be copied to a full-text field DescritpionAuto. This is what copy_to does.
What will synonym do?
To enable synonym we need to define a custom analyzer, see MySynonymAnalyzer which we defined under "settings" above.
Roughly, it will replace every token that matches something on the left of => with the token on the right.
How will the documents look like?
Let's insert a few example documents:
POST /my_index/_doc
{
"Category": [
"beds",
"adult"
],
"Color": "red",
"Size": {
"LengthCM": 150
}
}
POST /my_index/_doc
{
"Category": [
"beds",
"children"
],
"Color": "red",
"Size": {
"LengthCM": 120
}
}
POST /my_index/_doc
{
"Category": [
"couches",
"adult",
"family"
],
"Color": "blue",
"Size": {
"LengthCM": 200
}
}
POST /my_index/_doc
{
"Category": [
"couches",
"adult",
"family"
],
"Color": "red",
"Size": {
"LengthCM": 200
}
}
As you can see, DescriptionAuto is not present in the original documents - though due to copy_to we will be able to query it.
Let's see how.
Performing the search!
Now we can try out our index with a simple query_string query:
POST /my_index/_doc/_search
{
"query": {
"query_string": {
"query": "red beds for kids 120cm",
"default_field": "DescriptionAuto"
}
}
}
The results will look something like the following:
"hits": {
...
"max_score": 2.3611186,
"hits": [
{
...
"_score": 2.3611186,
"_source": {
"Category": [
"beds",
"children"
],
"Color": "red",
"Size": {
"LengthCM": 120
}
}
},
{
...
"_score": 1.0998137,
"_source": {
"Category": [
"beds",
"adult"
],
"Color": "red",
"Size": {
"LengthCM": 150
}
}
},
{
...
"_score": 0.34116736,
"_source": {
"Category": [
"couches",
"adult",
"family"
],
"Color": "red",
"Size": {
"LengthCM": 200
}
}
}
]
}
The document with categories beds and children and color red is on top. And its relevance score is twice bigger than of its follow-up!
How can I check how Elasticsearch interpreted the user's query?
It is easy to do via analyze API:
POST /my_index/_analyze
{
"text": "red bed for kids 120cm",
"analyzer": "MySynonymAnalyzer"
}
{
"tokens": [
{
"token": "red",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "bed",
"start_offset": 4,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "for",
"start_offset": 8,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "children",
"start_offset": 12,
"end_offset": 16,
"type": "SYNONYM",
"position": 3
},
{
"token": "120cm",
"start_offset": 17,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 4
}
]
}
As you can see, there is no token kids, but there is token children.
On a side note, in this example Elasticsearch wasn't able, though, to parse the size of the bed: token 120cm didn't match to anything, since all sizes are integers, like 120, 150, etc. Another layer of tweaking will be needed to extract 120 from 120cm token.
I hope this gives an idea of what can be achieved with Elasticsearch's built-in text analysis capabilities!

MongoDB Aggregate Regex Match or Full Text Search returns whole Document

Ex. Record
[
{
"_id": "5528cfd2e71144e020cb6494",
"__v": 11,
"Product": [
{
"_id": "5528cfd2e71144e020cb6495",
"isFav": true,
"quantity": 27,
"price": 148,
"description": "100g",
"brand": "JaldiLa",
"name": "Grapes",
"sku": "GRP"
},
{
"_id": "552963ed63d867b81e18d357",
"isFav": false,
"quantity": 13,
"price": 290,
"description": "100g",
"brand": "JaldiLa",
"name": "Apple",
"sku": "APL"
}
],
"brands": [
"Whole Foods",
"Costco",
"Bee's",
"Masons"
],
"sku": "FRT",
"name": "Fruits"
}
]
My Mongoose function to return query from AngularJS(http://localhost:8080/api/search?s=)
router.route('/search')
.get(function(req, res) {
Dept.aggregate(
{ $match: { $text: { $search: req.query.s } } },
{ $project : { name : 1, _id : 1, 'Product.name' : 1, 'Product._id' : 1} },
{ $unwind : "$Product" },
{ $group : {
_id : "$_id",
Category : { $addToSet : "$name"},
Product : { $push : "$Product"}
}}
)
});
RESULT: e.g. http://localhost:8080/api/search?s=Apple / Grape / Carrot, result is same for all.
[
{
"_id": "5528cfd2e71144e020cb6494",
"Category": ["Fruits"],
"Product": [
{
"_id": "5528cfd2e71144e020cb6495",
"name": "Grapes"
},
{
"_id": "552963ed63d867b81e18d357",
"name": "Apple"
},
{
"_id": "552e61920c530fb848c61510",
"name": "Carrots"
}
]
}
]
PROBLEM: On a query of "apple", it returns all objects within Product instead of just "grapes", i think maybe putting match after unwind would do the trick or $regex case
WHAT I WANT: e.g. for a searchString of "grape"
Also I want it to start sending results as soon as I send in the first two letters of my query.
[{
"_id": ["5528cfd2e71144e020cb6494"], //I want this in array as it messes my loop up
"Category": "Fruits", //Yes I do not want this in array like I'm getting in my resutls
"Product": [{
"_id": "5528cfd2e71144e020cb6495",
"name": "Grapes"
}]
}]
Thanks for being patient.
Use the following aggregation pipeline:
var search = "apple",
pipeline = [
{
"$match": {
"Product.name": { "$regex": search, "$options": "i" }
}
},
{
"$unwind": "$Product"
},
{
"$match": {
"Product.name": { "$regex": search, "$options": "i" }
}
},
{
"$project": {
"Category": "$name",
"Product._id": 1,
"Product.name": 1
}
}
];
db.collection.aggregate(pipeline);
With the above sample document and a regex (case-insensitive) search for "apple" on the name field of the Product array, the above aggregation pipeline produces the result:
Output:
/* 1 */
{
"result" : [
{
"_id" : "5528cfd2e71144e020cb6494",
"Product" : {
"_id" : "552963ed63d867b81e18d357",
"name" : "Apple"
},
"Category" : "Fruits"
}
],
"ok" : 1
}

Unwind array of objects mongoDB

My collection contains the following two documents
{
"BornYear": 2000,
"Type": "Zebra",
"Owners": [
{
"Name": "James Bond",
"Phone": "007"
}
]
}
{
"BornYear": 2012,
"Type": "Dog",
"Owners": [
{
"Name": "James Brown",
"Phone": "123"
},
{
"Name": "Sarah Frater",
"Phone": "345"
}
]
}
I would like to find all the animals whichs have an owner called something with James.
I try to unwind the Owners array, but cannot get access to the Name variable.
Bit of a misnomer here. To just find the "objects" or items in a "collection" then all you really need to do is match the "object/item"
db.collection.find({
"Owners.Name": /^James/
})
Which works, but does not of course limit the results to the "first" match of "James", which would be:
db.collection.find(
{ "Owners.Name": /^James/ },
{ "Owners.$": 1 }
)
As a basic projection. But that does not give any more than a "single" match, which means you need the .aggregate() method instead like so:
db.collection.aggregate([
// Match the document
{ "$match": {
"Owners.Name": /^James/
}},
// Flatten or de-normalize the array
{ "$unwind": "Owners" },
// Filter th content
{ "$match": {
"Owners.Name": /^James/
}},
// Maybe group it back
{ "$group": {
"_id": "$_id",
"BornYear": { "$first": "$BornYear" },
"Type": { "$first": "$Type" },
"Ownners": { "$push": "$Owners" }
}}
])
And that allows more than one match in a sub-document array while filtering.
The other point is the "anchor" or "^" caret on the regular expression. You really need it where you can, to make matches at the "start" of the string where an index can be properly used. Open ended regex operations cannot use an index.
You can use dot notation to match against the fields of array elements:
db.test.find({'Owners.Name': /James/})