Conditional query with Elastic Search - regex

My Elasticsearch Index looks like as below.
title:The Godfather
year:1972
genres:Crime & Drama
title:Lawrence of Arabia
year:1962
genres:Adventure,Biography &Drama
title:To Kill a Mockingbird
year:1973
genres:Mystery
title:Apocalypse Now
year:1975
genres:Thriller
I am trying to write query in elasticsearch which should first check for generes field if it contains & If it does than perform other matching operation on same fields. if generes doesnt contain &, it should skill other matching operation. Basically i am looking for if condition in Elasticsearch.
below is my query but its doesnt seems to be working fine..
{
"query": {
"bool": {
"should": [
{
"regexp": {
"genres": ".*&.*"
}
},
{"match": {"genres": {"query": "Adventure"}}}
]
}
}
}
I was following below suggestion on stackOverFlow.
How to write a conditional query with Elasticsearch?

You can nest bool queries, so what you can do is do have a top level bool query only with a should clause, then inside of that should clause, you have two more bool queries. Each of those contains a must part, that contains the search for & and whatever else. Like this
bool:
should:
- bool:
must: [ _search for & and whatever else_ ]
- bool:
must: [ _search for another criteria_]
Hope this helps!

Related

Nested Array search in MongoDB/PyMongo while using aggregate

I am trying to search for a keyword inside array of arrays in a Mongo document.
{
"PRODUCT_NAME" : "Truffle Cake",
"TAGS": [
["Cakes", 100],
["Flowers", 100],
]
}
Usually, I would do something like this and it would work.
db.collection.find( {"TAGS":{"$elemMatch":{ "$elemMatch": {"$in":['search_text']} } }} )
But now, I changed this query to an aggregate based query due to other requirements. I've tried $filter , $match but not able to replicate the above query exactly..
Can anyone convert the above code so that it can directly work with aggregate?
(I use PyMongo)
$match uses the same query syntax as the query language (find), from the docs:
The query syntax is identical to the read operation query syntax;
This means if you have a query that works in a "find", it will also work within a $match stage, like so:
db.collection.aggregate([
{
$match: {
"TAGS": {
"$elemMatch": {
"$elemMatch": {
"$in": [
"Cakes"
]
}
}
}
}
}
])
Check this live on Mongo Playground

Find string in between in kibana elastic search with regex like in splunk

In splunk, we can filter out dynamic string in between two strings.
Say for example,
<TextileType>Shirt</TextileType>
<TextileType>Trousers</TextileType>
<TextileType>Shirt</TextileType>
<TextileType>Trousers</TextileType>
<TextileType>Shirt</TextileType>
The output I am expecting:
Shirt - 3
Trousers - 2
I am able to do this in splunk, easily.
Picture copied from Google (not exact one)
How can I achieve this in Kibana ?
Tried many ways, but not able to do any regex as per my need.
Note: Here's the example json query, in which I need to add regex. In this example, I am just trying to search for "Shirt" manually, which I am expecting to get dynamically.
{
"query": {
"match": {
"text": {
"query": "Shirt",
"type": "phrase"
}
}
}
}
Considering data is in the sample index, you can use a wildcard search:
GET /sample/_search
{
"query": {
"wildcard":{
"column2":"*Shirt*"
}
}
}
Notice how it only returns results containing keyword Shirt
If you are looking to clean the data, you'd need to run it through a logstash pipeline to strip the XML tags and leave you with the text.

Elasticsearch Query on indexes whose name is matching a certain pattern

I have a couple of indexes in my Elasticsearch DB as follows
Index_2019_01
Index_2019_02
Index_2019_03
Index_2019_04
.
.
Index_2019_12
Suppose I want to search only on the first 3 Indexes.
I mean a regular expression like this:
select count(*) from Index_2019_0[1-3] where LanguageId="English"
What is the correct way to do that in Elasticsearch?
How can I query several indexes with certain names?
This can be achieved via multi-index search, which is a built-in capability of Elasticsearch. To achieve described behavior one should try a query like this:
POST /index_2019_01,index_2019_02/_search
{
"query": {
"match": {
"LanguageID": "English"
}
}
}
Or, using URI search:
curl 'http://<host>:<port>/index_2019_01,index_2019_02/_search?q=LanguageID:English'
More details are available here. Note that Elasticsearch requires index names to be lowercase.
Can I use a regex to specify index name pattern?
In short, no. It is possible to use index name in queries using a special "virtual" field _index but its use is limited. For instance, one cannot use a regexp against index name:
The _index is exposed as a virtual field — it is not added to the
Lucene index as a real field. This means that you can use the _index
field in a term or terms query (or any query that is rewritten to a
term query, such as the match, query_string or simple_query_string
query), but it does not support prefix, wildcard, regexp, or fuzzy
queries.
For instance, the query from above can be rewritten as:
POST /_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"_index": [
"index_2019_01",
"index_2019_02"
]
}
},
{
"match": {
"LanguageID": "English"
}
}
]
}
}
}
Which employs a bool and a terms queries.
Hope that helps!
Why use POST when you are not adding any additional data to it.
I advise using GET for your case. Secondly, If the Index have similar names like in your case, you should be using an index pattern like in the query below,
GET /index_2019_*/_search
{
"query": {
"match": {
"LanguageID": "English"
}
}
}
OR in a URL
curl -XGET "http://<host>:<port>/index_2019_*/_search" -H 'Content-Type: application/json' -d'{"query": {"match":{"LanguageID": "English"}}}'
While searching for indices using a regex is not possible you might be able to use date math to take you a bit further.
You can look at the docs here
As an example, lets say you wish the last 3 months from those indices
that means that if we have
index_2019_01
index_2019_02
index_2019_03
index_2019_04
And today is 2019/04/20, we could use the following query to get 04,03 and 02
GET /<index-{now/M-0M{yyyy_MM}}>,<index-{now/M-1M{yyyy_MM}}>,<index-{now/M-2M{yyyy_MM}}>
I used M-0M for the first one so the query construction loop doesn't need a special case for the first index
Look at the docs regarding URL encoding this query and how to have literal braces in the index name, if a client is used the URL encoding is done for you (at least in the python client)

Elasticsearch aggregation to extract pattern and occurrences

I have trouble formulating what I'm looking for so I'll use an example:
You put 3 documents in elasticsearch all with a field "name" containing these values: "test", "superTest51", "stvv".
Is it possible to extract a regular expression like pattern with the occurrences? In this case:
"xxxx": 2 occurrences
"x{5}Xxxx99": 1 occurrence
I've read some things about analyzers, but I don't think that's what I'm looking for.
Edit: To make the question clearer: I don't want to search for a regex pattern, I want to do an aggregate on a regular expression replaced field. For example replace [a-z] with x. Is the best way really to do the regular expression replace outside of elasticsearch?
Based on the formulation of your request, not sure this will match what you are looking for, but assuming you mean to search based on regex ,
following should be what you are looking for:
wildcard and regexp queries
Do take note that the behavior will be different whether the field targeted is analyzed or not.
Typically if you went with the vanilla setup of Elasticsearch as most people to start, your field will likely be analyzed, you can check your the events mapping in your indices to confirm that.
Based on your example and assuming you have a not_analyzed name field:
GET _search
{
"query": {
"regexp": {
"name": "[a-z]{4}"
}
}
}
GET _search
{
"query": {
"regexp": {
"name": "[a-z]{5}[A-Z][a-z]{3}[0-9]{2}"
}
}
}
Based on your update, and a quick search (am not that familiar with aggregations), could be something like the following would match your expectations:
GET _search
{
"size": 0,
"aggs": {
"regmatch": {
"filters": {
"filters": {
"xxxx": {
"regexp": {
"name": "[a-z]{4}"
}
},
"x{5}Xxxx99": {
"regexp": {
"name": "[a-z]{5}[A-Z][a-z]{3}[0-9]{2}"
}
}
}
}
}
}
}
This will give you 3 counts:
- total number of events
- number of first regex match
- number of second regex match

elasticsearch - search with regex involving space

I want to perform searching using regular expression involving whitespace in elasticsearch.
I have already set my field to not_analyzed. And it's mapping is just like
"type1": {
"properties": {
"field1": {
"type": "string",
"index": "not_analyzed",
"store": true
}
}
}
And I input two value for test,
"field1":"XXX YYY ZZZ"
"field1":"XXX ZZZ YYY"
And i do some case using regex query /XXX YYY/ (I want to use this query to find record1 but not record2)
{
"query": {
"query_string": {
"query": "/XXX YYY/"
}
}
}
But it return 0 results.
However if I search without using regex (without the forward slash '/'), both record1 and record2 are returned.
Is that in elasticsearch, i cannot search using regex query involving space?
What you need is a ''term'' query that doesn't tokenise the search query by breaking it down into smaller parts. More about the term query here: https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-term-query.html
There's a special breed of term queries that allows you to use regexes called regexp queries. That should match any whitespaces as well: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html
You can keep using your query string, but your regexp is just missing a tiny part, i.e. the .* at the end. If you run that you'll get the single result you expect.
{
"query": {
"query_string": {
"query": "/XXX YYY.*/"
}
}
}
You can use regexp queries to achieve this. Mind you, the query performance may be slow. The below query will search for all documents in which the value of field1 contains "XXX YYY".
POST <index_name>/type1/_search
{
"query": {
"regexp": {
"field1": ".*XXX YYY.*"
}
}
}