I have two models (Post and Comment) that I want to search with elasticsearch. The Post model has a field popolarity which is an integer that represent the popolarity of the post.
I am using the gem elasticsearch-model and now I want to search on Posts and Comments with one query, this works fine with
Elasticsearch::Model.search('search string', [Post, Comment])
Now I want to add an option to boost the Post elements by it's popolarity field like described in Boosting by Popularity. Is this possible with the Ruby API I tried sth. like
Elasticsearch::Model.search('antilope', [Post, Comment], field_value_factor: {field: 'clicks'})
but this gives me this error
ArgumentError: URL parameter 'field_value_factor' is not supported
You should be able to do it by passing a full-fledge function-score query like this:
Elasticsearch::Model.search({
"query": {
"function_score": {
"query": {
"match": {
"_all": "antilope"
}
},
"field_value_factor": {
"field": "clicks"
}
}
}
}, [Post, Comment])
Related
I have a couple of indexes in my Elasticsearch DB as follows
Index_2019_01
Index_2019_02
Index_2019_03
Index_2019_04
.
.
Index_2019_12
Suppose I want to search only on the first 3 Indexes.
I mean a regular expression like this:
select count(*) from Index_2019_0[1-3] where LanguageId="English"
What is the correct way to do that in Elasticsearch?
How can I query several indexes with certain names?
This can be achieved via multi-index search, which is a built-in capability of Elasticsearch. To achieve described behavior one should try a query like this:
POST /index_2019_01,index_2019_02/_search
{
"query": {
"match": {
"LanguageID": "English"
}
}
}
Or, using URI search:
curl 'http://<host>:<port>/index_2019_01,index_2019_02/_search?q=LanguageID:English'
More details are available here. Note that Elasticsearch requires index names to be lowercase.
Can I use a regex to specify index name pattern?
In short, no. It is possible to use index name in queries using a special "virtual" field _index but its use is limited. For instance, one cannot use a regexp against index name:
The _index is exposed as a virtual field — it is not added to the
Lucene index as a real field. This means that you can use the _index
field in a term or terms query (or any query that is rewritten to a
term query, such as the match, query_string or simple_query_string
query), but it does not support prefix, wildcard, regexp, or fuzzy
queries.
For instance, the query from above can be rewritten as:
POST /_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"_index": [
"index_2019_01",
"index_2019_02"
]
}
},
{
"match": {
"LanguageID": "English"
}
}
]
}
}
}
Which employs a bool and a terms queries.
Hope that helps!
Why use POST when you are not adding any additional data to it.
I advise using GET for your case. Secondly, If the Index have similar names like in your case, you should be using an index pattern like in the query below,
GET /index_2019_*/_search
{
"query": {
"match": {
"LanguageID": "English"
}
}
}
OR in a URL
curl -XGET "http://<host>:<port>/index_2019_*/_search" -H 'Content-Type: application/json' -d'{"query": {"match":{"LanguageID": "English"}}}'
While searching for indices using a regex is not possible you might be able to use date math to take you a bit further.
You can look at the docs here
As an example, lets say you wish the last 3 months from those indices
that means that if we have
index_2019_01
index_2019_02
index_2019_03
index_2019_04
And today is 2019/04/20, we could use the following query to get 04,03 and 02
GET /<index-{now/M-0M{yyyy_MM}}>,<index-{now/M-1M{yyyy_MM}}>,<index-{now/M-2M{yyyy_MM}}>
I used M-0M for the first one so the query construction loop doesn't need a special case for the first index
Look at the docs regarding URL encoding this query and how to have literal braces in the index name, if a client is used the URL encoding is done for you (at least in the python client)
I've got a data structure fairly similar to the one described on the Loopback HasManyThrough documentation page.
For a given Physician (e.g. id 2), I would like to get all their patients with an appointment AND their appointment date.
I can do a GET operation like this:
GET /physicians/2
with the filter header { "include" : {"relation":"patients"} }
And I do get the physician, and the list of patients, but I lose the appointmentDate of the relation.
Or, I can do a GET operation on the relation table like the documentation shows:
GET /appointments
with the filter header { "include" : {"relation":"patient"}, "where":{"physicianId":2}} }
And I get the the appointments, with the date and the patient embedded, but not the physician details.
I can't seem to be able to combine the two.
Is there a way to get the whole data with one query?
The data would be something like this:
[
"name" : "Dr John",
"appointments" : [ {
"appointmentDate": "2014-06-01",
"patient": {
"name": "Jane Smith",
"id": 1
}
}]
]
One way hack I found is to define the relation twice. Once as a HasManyThrough and once as a HasMany to the appointments table, then I can do something like this:
GET /physicians/2
with the filter header { "include" : {"relation":"appointments","scope":{"include":["patient"]} } }
But that doesn't seem right, or could maybe lead to odd behaviours with the duplicated relation.. but maybe I'm paranoid.
You could include both models
GET /appointments
{ "include": ["patient", "physician"], "where": { "physicianId":2 } }
You will get quite a lot of duplicate data though (details of physician with id 2). I believe, that HasManyThrough relation model was initially not supposed to carry any extra data and therefore, it has some limitations. Here is a related github issue.
I have a ES DB storing history records from a process I run every day. Because I want to show only 20 records per page in the history (order by date), I was using pagination (size + from_) combined scroll, which worked just fine. But when I wanted to used sort in the query it didn't work. So I found that scroll with sort don't work. Looking for another alternative I tried the ES helper scan which works fine for scrolling and sorting the results, but with this solution pagination doesn't seem to work, which I don't understand why since the API says that scan sends all the parameters to the underlying search function. So my question is if there is any method to combine the three options.
Thanks,
Ruben
When using the elasticsearch.helpers.scan function, you need to pass preserve_order=True to enable sorting.
(Tested using elasticsearch==7.5.1)
yes, you can combine scroll with sort, but, when you can sort string, you will need change the mapping for it works fine, Documentation Here
In order to sort on a string field, that field should contain one term
only: the whole not_analyzed string. But of course we still need the
field to be analyzed in order to be able to query it as full text.
The naive approach to indexing the same string in two ways would be to
include two separate fields in the document: one that is analyzed for
searching, and one that is not_analyzed for sorting.
"tweet": {
"type": "string",
"analyzer": "english",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
The main tweet field is just the same as before: an analyzed full-text field.
The new tweet.raw subfield is not_analyzed.
Now, or at least as soon as we have reindexed our data, we can use the
tweet field for search and the tweet.raw field for sorting:
GET /_search
{
"query": {
"match": {
"tweet": "elasticsearch"
}
},
"sort": "tweet.raw"
}
all new jsfiddle: http://jsfiddle.net/vJxvc/2/
Currently, i query an api that will return JSON like this. The API cannot be changed for now, which is why I need to work around that.
[
{"timestamp":1406111961, "values":[1236.181, 1157.695, 698.231]},
{"timestamp":1406111970, "values":[1273.455, 1153.577, 693.591]}
]
(could be a lot more lines, of course)
As you can see, each line has a timestamp and then an array of values. My problem is, that i would actually like to transpose that. Looking at the first line alone:
{"timestamp":1406111961, "values":[1236.181, 1157.695, 698.231]}
It contains a few measurements taken at the same time. This would need to become this in my ember project:
{
"sensor_id": 1, // can be derived from the array index
"timestamp": 1406111961,
"value": 1236.181
},
{
"sensor_id": 2,
"timestamp": 1406111961,
"value": 1157.695
},
{
"sensor_id": 3,
"timestamp": 1406111961,
"value": 698.231
}
And those values would have to be pushed into the respective sensor models.
The transformation itself is trivial, but i have no idea where i would put it in ember and how i could alter many ember models at the same time.
you could make your model an array and override the normalize method on your adapter. The normalize method is where you do the transformation, and since your json is an array, an Ember.Array as a model would work.
I am not a ember pro but looking at the manual I would think of something like this:
a = [
{"timestamp":1406111961, "values":[1236.181, 1157.695, 698.231]},
{"timestamp":1406111970, "values":[1273.455, 1153.577, 693.591]}
];
b = [];
a.forEach(function(item) {
item.values.forEach(function(value, sensor_id) {
b.push({
sensor_id: sensor_id,
timestamp: item.timestamp,
value: value
});
});
});
console.log(b);
Example http://jsfiddle.net/kRUV4/
Update
Just saw your jsfiddle... You can geht the store like this: How to get Ember Data's "store" from anywhere in the application so that I can do store.find()?
So we are using Django-haystack with the Elasticsearch backend to index a bunch of data for searching. It is very fast and is working great for the most part, but I notice something that I want that seems to be absent. For example, consider the search query "cellar door". I would want a query that is slightly off, like a misspelling, e.g. "cellar dor" or "celar door" to match results for "cellar door". If I try queries like this with our current setup it returns 0 results. I tried using an EdgeNgramField in the search index on the field we wanted to index, but this seems to have absolutely no effect.
Thanks.
Use suggest to perform spell check.
curl -XPOST 'localhost:9200/index/_search?search_type=count' -d '{
{
"suggest": {
"body": {
"text": "celar door",
"term": {
"field": "summary",
"analyzer": "simple"
}
}
}
}'