ElasticSearch Templates for filters - templates

I'm new to elastic search. From what I understand there are two types of templates: Template Query & Search Template.
Based on this post and their descriptions on the elastic reference doc. it looks like both are templates for queries.
Because filters have better performance than queries I want to create filter templates.
Is there some way to do this, I feel like there must be.
Thanks!
FYI, if it's important I'm using the JAVA API to interact with Elasticsearch

You can create filter templates. There is nothing special about a search template that excludes filters. In fact they have some pretty good examples in the documentation.
{
"query": {
"filtered": {
"query": {
"match": {
"line": "{{text}}"
}
},
"filter": {
{{#line_no}}
"range": {
"line_no": {
{{#start}}
"gte": "{{start}}"
{{#end}},{{/end}}
{{/start}}
{{#end}}
"lte": "{{end}}"
{{/end}}
}
}
{{/line_no}}
}
}
}
}

Related

Not able to get desired search results in ElasticSearch search api

I have field "xyz" on which i want to search. The type of the field is keyword. The different values of the field "xyz "are -
a/b/c/d
a/b/c/e
a/b/f/g
a/b/f/h
Now for the following query -
{
"query": {
"query_string" : {
"query" : "(xyz:(\"a/b/c\"*))"
}
}
}
I should only get these two results -
a/b/c/d
a/b/c/e
but i get all the four results -
a/b/c/d
a/b/c/e
a/b/f/g
a/b/f/h
Edit -
Actually i am not directly querying on ElasticSearch, I am using this API https://atlas.apache.org/api/v2/resource_DiscoveryREST.html#resource_DiscoveryREST_searchWithParameters_POST which creates the above mentioned query for elasticsearch, so i dont have much control over the elasticsearch query_string. What i can change is the elasticsearch analyzer for this field or it's type.
You'll need to let the query_string parser know you'll be using regex so wrap the whole thing in /.../ and escape the forward slashes:
{
"query": {
"query_string": {
"query": "xyz:/(a\\/b\\/c\\/.*)/"
}
}
}
Or, you might as well use a regexp query:
{
"query": {
"regexp": {
"xyz": "a/b/c/.*"
}
}
}

How to automate the creation of elasticsearch index patterns for all days?

I am using cloudwatch subscription filter which automatically sends logs to elasticsearch aws and then I use Kibana from there. The issue is that everyday cloudwatch creates a new indice due to which I have to manually create the new index pattern each day in kibana. Accordingly I will have to create new monitors and alerts in kibana as well each day. I have to automate this somehow. Also if there is better option with which I can go forward would be great. I know datadog is one good option.
Typical work flow will look like this (there are other methods)
Choose a pattern when creating an index. Like staff-202001, staff-202002, etc
Add each index to an alias. Like staff
This can be achieved in multiple ways, easiest is to create a template with index pattern , alias and mapping.
Example: Any new index created matching the pattern staff-* will be assigned with given mapping and attached to alias staff and we can query staff instead of individual indexes and setup alerts.
We can use cwl--aws-containerinsights-eks-cluster-for-test-host to run queries.
POST _template/cwl--aws-containerinsights-eks-cluster-for-test-host
{
"index_patterns": [
"cwl--aws-containerinsights-eks-cluster-for-test-host-*"
],
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
}
}
},
"aliases": {
"cwl--aws-containerinsights-eks-cluster-for-test-host": {}
}
}
Note: If unsure of mapping, we can remove mapping section.

How the users can access my Elasticsearch database in my Django SaaS?

Let's say that I have a SaaS based on Django backend that processes the data of the users and write everything to the Elasticsearch. Now I would like to give users access to search and request their data stored in ES using all possible search requests available in ES. Obviously the user should have only access to his data, not to other user's data. I am aware that it can be done in a lot of different ways but I wonder what is safe and the best solution? At this point I store everything in one index and type in the way shown below but I can do this in any way.
"_index": "example_index",
"_type": "example_type",
"_id": "H2s-lGsdshEzmewdKtL",
"_score": 1,
"_source": {
"user_id": 1,
"field1": "example1",
"field2": "example2",
"field3": "example3"
}
I think that the best way would be to associate every document with the user_id. The user would send for example GET request with body and authorization header with Token. I would use Token to extract id of the user for example in this way
key = request.META.get('HTTP_AUTHORIZATION').split()[1]
user_id = Token.objects.get(key=key).user_id
After this I would redirect his request to ES and only data that meet requirements and belongs to this user would be returned. Of course I could do this like shown above where I also add field user_id. For example I could use post_filter in this way:
To every request I would add something like this:
,
"post_filter": {
"match": {
"user_id": 1
}
}
For example the user sends GET with body
{
"query": {
"regexp": {
"tag": ".*example.*"
}
}
}
and I change this in my backend and redirect request to ES with body:
{
"query": {
"regexp": {
"tag": ".*example.*"
}
},
"post_filter": {
"match": {
"user_id": 1
}
}
}
but it doesn't seem to me that including this field in _source is a good idea. I am almost sure that it can be solved in a more optimal way than post_filtering. I see a lot of information about authorization in ES however I can’t find how can I associate document with user_id and then search only his documents without post_filtering. Any ideas?
UPDATE
My current solution looks in they way shown below however as I mentioned I believe that it is not optimal way. If anyone has an idea how can I solve this in the way described above I will be grateful for help.
I send for example
{
"query": {
"regexp": {
"tag": ".*test.*"
}
}
}
In Django backend I just do
key = request.META.get('HTTP_AUTHORIZATION').split()[1]
user_id = Token.objects.get(key=key).user_id
body = json.loads(request.body)
body['post_filter'] = {"match": {"user_id": user_id}}
res = es.search(index="pictures", doc_type="picture", body=body)
output = []
for hit in res['hits']['hits']:
output.append(hit["_source"])
return Response(
{'output': output},
status=status.HTTP_200_OK)
In elasticsearch 7.1, you have now basic security in the free version of elasticsearch. Thanks to that, you can control per indice thé Access of your user.

Elasticsearch Update Doc String Replacement

I have some documents on my Elasticsearch. I want to update my document contents by using String Regexp.
For example, I would like to replace all http words into https words, is it possible ?
Thank You
This should get you off to a start. Check out the "Update by Query" API here. The API allows you to include the update script and search query in the same request body.
Regarding your case, an example might look like this...
POST addresses/_update_by_query
{
"script":
{
"lang": "painless",
"inline": "ctx._source.data.url = ctx._source.data.url.replace('http', 'https')"
},
"query":
{
"query_string":
{
"query": "http://*",
"analyze_wildcard": true
}
}
}
Pretty self explanatory, but script is where we do the update, and query returns the documents to update.
Painless supports regex so you're in luck, look here for some examples, and update the inline value accordingly.

How to use the elasticsearch regex query correctly?

I am working on translating a Splunk query to Elasticsearch DSL.
I want to check if a URL in the logs contains something like:
"script>" OR "UNION ALL SELECT"
Fair enough I thought, went to the doc, and:
{
"regexp": {
"http.url": "script>"
}
}
Elasticsearch (2.3) replies:
"root_cause": [
{
"reason": "failed to parse search source. unknown search element [regexp]",
"type": "search_parse_exception",
"line": 2,
Could someone enlighten me please about these kinds of queries?
This is a pretty straightforward mistake when starting out with the documentation. In the docs, we generally only show the raw query (and its parameters). Queries are either compound queries or leaf queries. regexp is an example of a leaf query.
However, that's not enough to actually send the query. You're missing a simple wrapper part of the DSL for any query:
{
"query": {
"regexp": {
"http.url": "script>"
}
}
}
To use a compound query, the best way is to use the bool compound query.
It has must, must_not, should, or filter and each accept an array of queries (or filters, which are just scoreless, cacheable queries). should is the OR-like aspect of it, but do read the docs on how it behaves when you add must alongside it. The gist is that should by itself is exactly like an OR (as shown below), but if you combine it with must, then it becomes completely optional without using "minimum_should_match": 1.
{
"query": {
"bool": {
"should": [
{
"term": {
"http.url": "script>"
}
},
{
"term": {
"http.url": "UNION ALL SELECT"
}
}
]
}
}
}