I have a table for some activities like
[
{
"id": 123,
"name": "Ram",
"status": 1,
"activity": "Poster Design"
},
{
"id": 123,
"name": "Ram",
"status": 1,
"activity": "Poster Design"
},
{
"id": 124,
"name": "Leo",
"categories": [
"A",
"B",
"C"
],
"status": 1,
"activity": "Brochure"
},
{
"id": 134,
"name": "Levin",
"categories": [
"A",
"B",
"C"
],
"status": 1,
"activity": "3D Printing"
}
]
I want to get this data from elastic search 5.5 by sorting on field activity, but I need all the data corresponding to name = "Ram" first and then remaining in a single query.
You can use function score query to boost the result based on match for the filter(this case ram in name).
Following query should work for you
POST sort_index/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"boost": "5",
"functions": [{
"filter": {
"match": {
"name": "ram"
}
},
"random_score": {},
"weight": 1000
}],
"score_mode": "max"
}
},
"sort": [{
"activity.keyword": {
"order": "desc"
}
}]
}
I would suggest using a bool query combined with the should clause.
U will also need to use the sort clause on your field.
Related
I have data with multiple dimensions, stored in the Druid cluster. for example, Data of movies and the revenue they earned from each country where they were screened.
I'm trying to build a query that the answer to be returned will be a table of all the movies, the total revenue of each of them, and the revenue for each country.
I succeeded to do it in Turnilo - it generated for me the following Druid query -
[
[
{
"queryType": "timeseries",
"dataSource": "movies_source",
"intervals": "2021-11-18T00:01Z/2021-11-21T00:01Z",
"granularity": "all",
"aggregations": [
{
"name": "__VALUE__",
"type": "doubleSum",
"fieldName": "revenue"
}
]
},
{
"queryType": "topN",
"dataSource": "movies_source",
"intervals": "2021-11-18T00:01Z/2021-11-21T00:01Z",
"granularity": "all",
"dimension": {
"type": "default",
"dimension": "movie_id",
"outputName": "movie_id"
},
"aggregations": [
{
"name": "revenue",
"type": "doubleSum",
"fieldName": "revenue"
}
],
"metric": "revenue",
"threshold": 50
}
],
[
{
"queryType": "topN",
"dataSource": "movies_source",
"intervals": "2021-11-18T00:01Z/2021-11-21T00:01Z",
"granularity": "all",
"filter": {
"type": "selector",
"dimension": "movie_id",
"value": "some_movie_id"
},
"dimension": {
"type": "default",
"dimension": "country",
"outputName": "country"
},
"aggregations": [
{
"name": "revenue",
"type": "doubleSum",
"fieldName": "revenue"
}
],
"metric": "revenue",
"threshold": 5
}
]
]
But it doesn't work when I'm trying to use it as a body for a Postman query - I got
{
"error": "Unknown exception",
"errorMessage": "Unexpected token (START_ARRAY), expected VALUE_STRING: need JSON String that contains type id (for subtype of org.apache.druid.query.Query)\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 2, column: 3]",
"errorClass": "com.fasterxml.jackson.databind.exc.MismatchedInputException",
"host": null
}
How should I build the corresponding query so that it works with Postman?
I am not familiar with Turnilo but have you tried using the Druid Console to write SQL and convert to Native request with the "Explain SQL query" option under the "Run/..." menu?
Your native queries seem to be doing a Top N instead of listing all movies, so I think the SQL might be something like:
SELECT movie_id, country_id, SUM(revenue) total_revenue
FROM movies_source
WHERE __time BETWEEN '2021-11-18 00:01:00' AND '2021-11-21 00:01:00'
GROUP BY movie_id, country_id
ORDER BY total_revenue DESC
LIMIT 50
I don't have the data source to test, but tested with sample wikipedia data with similar query structure:
SELECT namespace, cityName, sum(sum_added) total
FROM "wikipedia" r
WHERE cityName IS NOT NULL
AND __time BETWEEN '2015-09-12 00:00:00' AND '2015-09-15 00:00:00'
GROUP BY namespace, cityName
ORDER BY total DESC
limit 50
which results in the following Native query:
{
"queryType": "groupBy",
"dataSource": {
"type": "table",
"name": "wikipedia"
},
"intervals": {
"type": "intervals",
"intervals": [
"2015-09-12T00:00:00.000Z/2015-09-15T00:00:00.001Z"
]
},
"virtualColumns": [],
"filter": {
"type": "not",
"field": {
"type": "selector",
"dimension": "cityName",
"value": null,
"extractionFn": null
}
},
"granularity": {
"type": "all"
},
"dimensions": [
{
"type": "default",
"dimension": "namespace",
"outputName": "d0",
"outputType": "STRING"
},
{
"type": "default",
"dimension": "cityName",
"outputName": "d1",
"outputType": "STRING"
}
],
"aggregations": [
{
"type": "longSum",
"name": "a0",
"fieldName": "sum_added",
"expression": null
}
],
"postAggregations": [],
"having": null,
"limitSpec": {
"type": "default",
"columns": [
{
"dimension": "a0",
"direction": "descending",
"dimensionOrder": {
"type": "numeric"
}
}
],
"limit": 50
},
"context": {
"populateCache": false,
"sqlOuterLimit": 101,
"sqlQueryId": "cd5aabed-5e08-49b7-af63-fe82c125d3ee",
"useApproximateCountDistinct": false,
"useApproximateTopN": false,
"useCache": false
},
"descending": false
}
I am trying to fetch data from API in react component as
{this.props.buyer && this.props.buyer[0].phone_number[0].number} - it's throwing error
Cannot read property 'number' of undefined
{this.props.buyer && this.props.buyer[0].name} - it's working fine
This is the API data
Orders: {
buyer:
},
}
[
{
"id": 2,
"name": "Qi Xiang",
"type": "Consignee",
"address": {
"id": 2,
"type": "shipping",
"street": "China China",
"city": "Beijing",
"postal_code": "34343",
"province": "23232",
"country": "CN"
},
"email": null,
"phone_number": {
"number": "323232",
"type": "Phone"
},
"id_image_url": "/api/files/24e49645-df42-4984-a
}
]
},
}
Your phonenumber is not array. You must use this:
this.props.buyer[0].phone_number.number
I would like to serve my visitors the best results possible when they use our search feature.
To achieve this I would like to interpret the search query.
For example a user searches for 'red beds for kids 120cm'
I would like to interpret it as following:
Category-Filter is "beds" AND "children"
Color-filter is red
Size-filter is 120cm
Are there ready to go tools for Elasticsearch?
Will I need NLP in front of Elasticsearch?
Elasticsearch is pretty powerful on its own and is very much capable of returning the most relevant results to full-text search queries, provided that data is indexed and queried adequately.
Under the hood it always performs text analysis for full-text searches (for fields of type text). A text analyzer consists of a character filter, tokenizer and a token filter.
For instance, synonym token filter can replace kids with children in the user query.
Above that search queries on modern websites are often facilitated via category selectors in the UI, which can easily be implemented with querying keyword fields of Elasticsearch.
It might be enough to model your data correctly and tune its indexing to implement the search you need - and if that is not enough, you can always add some extra layer of NLP-like logic on the client side, like #2ps suggested.
Now let me show a toy example of what you can achieve with a synonym token filter and copy_to feature.
Let's define the mapping
Let's pretend that our products are characterized by the following properties: Category, Color, and Size.LengthCM.
The mapping will look something like:
PUT /my_index
{
"mappings": {
"properties": {
"Category": {
"type": "keyword",
"copy_to": "DescriptionAuto"
},
"Color": {
"type": "keyword",
"copy_to": "DescriptionAuto"
},
"Size": {
"properties": {
"LengthCM": {
"type": "integer",
"copy_to": "DescriptionAuto"
}
}
},
"DescriptionAuto": {
"type": "text",
"analyzer": "MySynonymAnalyzer"
}
}
},
"settings": {
"index": {
"analysis": {
"analyzer": {
"MySynonymAnalyzer": {
"tokenizer": "standard",
"filter": [
"MySynonymFilter"
]
}
},
"filter": {
"MySynonymFilter": {
"type": "synonym",
"lenient": true,
"synonyms": [
"kid, kids => children"
]
}
}
}
}
}
}
Notice that we selected type keyword for the fields Category and Color.
Now, what about these copy_to and synonym?
What will copy_to do?
Every time we send an object for indexing into our index, value of the keyword field Category will be copied to a full-text field DescritpionAuto. This is what copy_to does.
What will synonym do?
To enable synonym we need to define a custom analyzer, see MySynonymAnalyzer which we defined under "settings" above.
Roughly, it will replace every token that matches something on the left of => with the token on the right.
How will the documents look like?
Let's insert a few example documents:
POST /my_index/_doc
{
"Category": [
"beds",
"adult"
],
"Color": "red",
"Size": {
"LengthCM": 150
}
}
POST /my_index/_doc
{
"Category": [
"beds",
"children"
],
"Color": "red",
"Size": {
"LengthCM": 120
}
}
POST /my_index/_doc
{
"Category": [
"couches",
"adult",
"family"
],
"Color": "blue",
"Size": {
"LengthCM": 200
}
}
POST /my_index/_doc
{
"Category": [
"couches",
"adult",
"family"
],
"Color": "red",
"Size": {
"LengthCM": 200
}
}
As you can see, DescriptionAuto is not present in the original documents - though due to copy_to we will be able to query it.
Let's see how.
Performing the search!
Now we can try out our index with a simple query_string query:
POST /my_index/_doc/_search
{
"query": {
"query_string": {
"query": "red beds for kids 120cm",
"default_field": "DescriptionAuto"
}
}
}
The results will look something like the following:
"hits": {
...
"max_score": 2.3611186,
"hits": [
{
...
"_score": 2.3611186,
"_source": {
"Category": [
"beds",
"children"
],
"Color": "red",
"Size": {
"LengthCM": 120
}
}
},
{
...
"_score": 1.0998137,
"_source": {
"Category": [
"beds",
"adult"
],
"Color": "red",
"Size": {
"LengthCM": 150
}
}
},
{
...
"_score": 0.34116736,
"_source": {
"Category": [
"couches",
"adult",
"family"
],
"Color": "red",
"Size": {
"LengthCM": 200
}
}
}
]
}
The document with categories beds and children and color red is on top. And its relevance score is twice bigger than of its follow-up!
How can I check how Elasticsearch interpreted the user's query?
It is easy to do via analyze API:
POST /my_index/_analyze
{
"text": "red bed for kids 120cm",
"analyzer": "MySynonymAnalyzer"
}
{
"tokens": [
{
"token": "red",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "bed",
"start_offset": 4,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "for",
"start_offset": 8,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "children",
"start_offset": 12,
"end_offset": 16,
"type": "SYNONYM",
"position": 3
},
{
"token": "120cm",
"start_offset": 17,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 4
}
]
}
As you can see, there is no token kids, but there is token children.
On a side note, in this example Elasticsearch wasn't able, though, to parse the size of the bed: token 120cm didn't match to anything, since all sizes are integers, like 120, 150, etc. Another layer of tweaking will be needed to extract 120 from 120cm token.
I hope this gives an idea of what can be achieved with Elasticsearch's built-in text analysis capabilities!
I am new to CouchDB. I have a 9 gb dataset loaded into my couchdb. I am able to map everything correctly. But I cannot reduce any of the results using the code written in the reduce column. When i tried log, log shows that rereduce values as false. Do i need to do anything special while doing the Map() or how to set the rereduce value is TRUE??
A sample of my data is as follows:
{
"_id": "33d4d945613344f13a3ee92933b160bf",
"_rev": "1-0425ca93e3aa939dff46dd51c3ab86f2",
"release": {
"genres": {
"genre": "Electronic"
},
"status": "Accepted",
"videos": {
"video": [
{
"title": "[1995] bola - krak jakomo",
"duration": 349,
"description": "[1995] bola - krak jakomo",
"src": "http://www.youtube.com/watch?v=KrELXoYThpI",
"embed": true
},
{
"title": "Bola - Forcasa 3",
"duration": 325,
"description": "Bola - Forcasa 3",
"src": "http://www.youtube.com/watch?v=Lz9itUo5xtc",
"embed": true
},
{
"title": "Bola (Darrell Fitton) - Metalurg (MV)",
"duration": 439,
"description": "Bola (Darrell Fitton) - Metalurg (MV)",
"src": "http://www.youtube.com/watch?v=_MYpOOMRAeQ",
"embed": true
}
]
},
"labels": {
"label": {
"catno": "SKA005",
"name": "Skam"
}
},
"companies": "",
"styles": {
"style": [
"Downtempo",
"Experimental",
"Ambient"
]
},
"formats": {
"format": {
"text": "",
"name": "Vinyl",
"qty": 1,
"descriptions": {
"description": [
"12\"",
"Limited Edition",
"33 ⅓ RPM"
]
}
}
},
"country": "UK",
"id": 1928,
"released": "1995-00-00",
"artists": {
"artist": {
"id": 390,
"anv": "",
"name": "Bola",
"role": "",
"tracks": "",
"join": ""
}
},
"title": 1,
"master_id": 13562,
"tracklist": {
"track": [
{
"position": "A1",
"duration": "4:33",
"title": "Forcasa 3"
},
{
"position": "A2",
"duration": "5:48",
"title": "Krak Jakomo"
},
{
"position": "B1",
"duration": "7:50",
"title": "Metalurg 2"
},
{
"position": "B2",
"duration": "6:40",
"title": "Balloom"
}
]
},
"data_quality": "Correct",
"extraartists": {
"artist": {
"id": 388200,
"anv": "",
"name": "Paul Solomons",
"role": "Mastered By",
"tracks": "",
"join": ""
}
},
"notes": "Limited to 480 copies.\nA1 is a shorter version than that found on the 'Soup' LP.\nA2 ends in a lock groove."
}
}
My intention is to count the mapped values. My mapping function is as follows:
function(doc){
if(doc.release)
emit(doc.release.title,1)
}
Map results shows around 5800 results
I want to use the following functions in the reduce tab to count:
Reduce:
_count or _sum
It does not give single rounded value. Even i cannot get the simple _count operations right !!! :(
for screenshot,
Please help me !!!
What you got was the sum of values per title. What you wanted, was the sum of values in general.
Change the grouping drop-down list to none.
Check CouchdDB's wiki for more details on grouping.
I have created an Index with field title_auto:
class GameIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, model_attr='title')
title = indexes.CharField(model_attr='title')
title_auto = indexes.NgramField(model_attr='title')
Elastic search settings look like this:
ELASTICSEARCH_INDEX_SETTINGS = {
'settings': {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_ngram"],
"token_chars": ["letter", "digit"]
},
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_edgengram"]
}
},
"tokenizer": {
"haystack_ngram_tokenizer": {
"type": "nGram",
"min_gram": 1,
"max_gram": 15,
},
"haystack_edgengram_tokenizer": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 15,
"side": "front"
}
},
"filter": {
"haystack_ngram": {
"type": "nGram",
"min_gram": 1,
"max_gram": 15
},
"haystack_edgengram": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 15
}
}
}
}
}
I try to do autocomplete search, it works, however returns too many irrelevant results:
qs = SearchQuerySet().models(Game).autocomplete(title_auto=search_phrase)
OR
qs = SearchQuerySet().models(Game).filter(title_auto=search_phrase)
Both of them produce the same output.
If search_phrase is "monopoly", first results contain "Monopoly" in their titles, however, as there are only 2 relevant items, it returns 51. The others have nothing to do with "Monopoly" at all.
So my question is - how can I change relevance of the results?
It's hard to tell for sure since I haven't seen your full mapping, but I suspect the problem is that the analyzer (one of them) is being used for both indexing and searching. So when you index a document, lots of ngram terms get created and indexed. If you search and your search text is also analyzed the same way, lots of search terms get generated. Since your smallest ngram is a single letter, pretty much any query is going to match a lot of documents.
We wrote a blog post about using ngrams for autocomplete that you might find helpful, here: http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams. But I'll give you a simpler example to illustrate what I mean. I'm not super familiar with haystack so I probably can't help you there, but I can explain the issue with ngrams in Elasticsearch.
First I'll set up an index that uses an ngram analyzer for both indexing and searching:
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 1,
"max_gram": 15,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"title": {
"type": "string",
"analyzer": "nGram_analyzer"
}
}
}
}
}
and add some docs:
PUT /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"title":"monopoly"}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"title":"oligopoly"}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"title":"plutocracy"}
{"index":{"_index":"test_index","_type":"doc","_id":4}}
{"title":"theocracy"}
{"index":{"_index":"test_index","_type":"doc","_id":5}}
{"title":"democracy"}
and run a simple match search for "poly":
POST /test_index/_search
{
"query": {
"match": {
"title": "poly"
}
}
}
it returns all five documents:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 4.729521,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 4.729521,
"_source": {
"title": "oligopoly"
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 4.3608603,
"_source": {
"title": "monopoly"
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1.0197333,
"_source": {
"title": "plutocracy"
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "4",
"_score": 0.31496215,
"_source": {
"title": "theocracy"
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "5",
"_score": 0.31496215,
"_source": {
"title": "democracy"
}
}
]
}
}
This is because the search term "poly" gets tokenized into the terms "p", "o", "l", and "y", which, since the "title" field in each of the documents was tokenized into single-letter terms, matches every document.
If we rebuild the index with this mapping instead (same analyzer and docs):
"mappings": {
"doc": {
"properties": {
"title": {
"type": "string",
"index_analyzer": "nGram_analyzer",
"search_analyzer": "standard"
}
}
}
}
the query will return what we expect:
POST /test_index/_search
{
"query": {
"match": {
"title": "poly"
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1.5108256,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1.5108256,
"_source": {
"title": "monopoly"
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 1.5108256,
"_source": {
"title": "oligopoly"
}
}
]
}
}
Edge ngrams work similarly, except that only terms that start at the beginning of the words will be used.
Here is the code I used for this example:
http://sense.qbox.io/gist/b24cbc531b483650c085a42963a49d6a23fa5579
Unfortunately at this point in time there seems to be no way (apart from implementing a custom backend) to configure search analyzers and index analyzers through Django-Haystack separately.
In case Django-Haystack autocomplete returns too wide results you can make use of the score value provided with each search result to optimize the output.
if search_query != "":
# Use autocomplete query or filter
# with results_filtered being a SearchQuerySet()
results_filtered = results_filtered.filter(text=search_query)
#Remove objects with a low score
for result in results_filtered:
if result.score < SEARCH_SCORE_THRESHOLD:
results_filtered = results_filtered.exclude(id=result.id)
It worked reasonable well for me without having to define my own backend and scheme building.