Elasticsearch matching words - regex

I have a problem. I want my query to match mysql with mysql!, mysql(, mysql ... in other words I want my query result to show only these values which have just word "mysql" OR word "mysql" and special characters around this word.
For example query should get - mysql, mysql, #mysql, mysql$. But not "mysql and R" or "mysql mysql".
I tried this query for matching, but I keep getting query malformed, no field after start_object, so not sure what to do.
Also wanted to ask maybe there are other option to perform the same task ?
POST skills/_search
{
"query": {
"filtered": {
"query": {
"match": {
"skillname": "mysql"
}
},
"filter": {
"type":"pattern_capture",
"patterns":["mysql([##]\\w+)"],
"preserve_original": true
}
}
}
}

Related

Not able to get desired search results in ElasticSearch search api

I have field "xyz" on which i want to search. The type of the field is keyword. The different values of the field "xyz "are -
a/b/c/d
a/b/c/e
a/b/f/g
a/b/f/h
Now for the following query -
{
"query": {
"query_string" : {
"query" : "(xyz:(\"a/b/c\"*))"
}
}
}
I should only get these two results -
a/b/c/d
a/b/c/e
but i get all the four results -
a/b/c/d
a/b/c/e
a/b/f/g
a/b/f/h
Edit -
Actually i am not directly querying on ElasticSearch, I am using this API https://atlas.apache.org/api/v2/resource_DiscoveryREST.html#resource_DiscoveryREST_searchWithParameters_POST which creates the above mentioned query for elasticsearch, so i dont have much control over the elasticsearch query_string. What i can change is the elasticsearch analyzer for this field or it's type.
You'll need to let the query_string parser know you'll be using regex so wrap the whole thing in /.../ and escape the forward slashes:
{
"query": {
"query_string": {
"query": "xyz:/(a\\/b\\/c\\/.*)/"
}
}
}
Or, you might as well use a regexp query:
{
"query": {
"regexp": {
"xyz": "a/b/c/.*"
}
}
}

Elasticsearch: How can I filter & group by specific URL paths?

I've got an index, urls, which looks like this:
path: {
type: "string"
},
#timestamp: {
type: "date",
format: "strict_date_optional_time||epoch_millis"
},
The path will store the PATH section from a url, e.g:
https://facebook.com/profile/photos/album/1
Would be stored as:
/profile/photos/album/1
I'm storing all sorts of paths, so there could be more like:
/profile/photos/album/1
/profile/photos/album/2
/profile/photos/album/2
/profile/photos/album/2
/profile/friends/1
/profile/friends/2
/newsfeed/me/
/newsfeed/me/
/newsfeed/friendName/
I'm trying to find out the number of unique pageviews each of the paths have. I'm unsure how I should do this, should I use a regexp?
I'd imagine it'd look something like (pseudo code):
{
"query": {
"regexp": {
"path": ""
},
"unique": true
}
}
So I found out how to do this. I'm using the aggs method & using a regex to exclude results!
{
"size": 0, // Don't return any _source results
"aggs": {
"path": { // This is the field that I'm
"terms": {
"field": "path",
"exclude": ".*(media|cache).*" // Add in the values here seper
}
}
}
}
Breakdown:
path
Just the label of aggregation
field (path)
The field which I want to run the following regex on
exclude
Don't return documents where path has media or cache in it
I found this out from Elasticsearch: Run aggregation on field & filter out specific values using a regexp not matching values

How to use the elasticsearch regex query correctly?

I am working on translating a Splunk query to Elasticsearch DSL.
I want to check if a URL in the logs contains something like:
"script>" OR "UNION ALL SELECT"
Fair enough I thought, went to the doc, and:
{
"regexp": {
"http.url": "script>"
}
}
Elasticsearch (2.3) replies:
"root_cause": [
{
"reason": "failed to parse search source. unknown search element [regexp]",
"type": "search_parse_exception",
"line": 2,
Could someone enlighten me please about these kinds of queries?
This is a pretty straightforward mistake when starting out with the documentation. In the docs, we generally only show the raw query (and its parameters). Queries are either compound queries or leaf queries. regexp is an example of a leaf query.
However, that's not enough to actually send the query. You're missing a simple wrapper part of the DSL for any query:
{
"query": {
"regexp": {
"http.url": "script>"
}
}
}
To use a compound query, the best way is to use the bool compound query.
It has must, must_not, should, or filter and each accept an array of queries (or filters, which are just scoreless, cacheable queries). should is the OR-like aspect of it, but do read the docs on how it behaves when you add must alongside it. The gist is that should by itself is exactly like an OR (as shown below), but if you combine it with must, then it becomes completely optional without using "minimum_should_match": 1.
{
"query": {
"bool": {
"should": [
{
"term": {
"http.url": "script>"
}
},
{
"term": {
"http.url": "UNION ALL SELECT"
}
}
]
}
}
}

Kibana regex not work

I need to search for a range value from my logs, but my regex doesn't work in Kibana.
/(took":[1-9][0-9][0-9][,])/g
Content:
{"real_time":"2016-05-03T10:02:13.360Z","content":{"delay":687,"updated":true,"searchItems":{"monitoring_id":"111354","params":{"pass":["111354"],"named":{"d":"2016-04-29|2016-04-30"},"action":"mentions","plugin":null,"controller":"api11","form":[],"url":{"url":"1.1\/mentions\/111354\/","publickey":"yn68FDuQ","time":"1462303544,8356","signature":"102ade1f6749e89be876fdb00a7b9ade","published_date":"2016-04-29|2016-04-30","ipp":"100","page":"14"},"isAjax":false},"source_ids":"","timestamp":"","pagination":"1300, 100","trackerId":"","onlyIds":[],"exceptIds":[],"timezone":"Brazil\/East"},"search":[{"index":"mentions_ro","type":"mention","from":1300,"size":100,"body":{"query":{"bool":{"must":[{"term":{"monitoring.id":"111354"}},{"range":{"published_at":{"gte":"1969-12-31T21:00:00-03:00","lte":"1969-12-31T21:00:00-03:00"}}}]}},"sort":{"published_at":{"order":"desc"}}},"fields":[]}],"response":{"took":500,"timed_out":false,"_shards":{"total":21,"successful":21,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}}}
My regex is working here, however:
https://regex101.com/r/pV4mR7/1
Obs:
I already tried to escape some characters
If I look the request sent to Elastic, Kibana uses a query string:
Any tips?
According to their documentation these characters are always metacharacters, and must be escaped if you want them as literals:
. ? + * | { } [ ] ( ) " \
These characters are metacharacters under certain modes:
# & < > ~ #
You don't need to put the comma in a char class.
It looks like you might not be able to just throw the regex in the search box.
Kibana only matches regexp over the _all field:
Try to "inspect" one of the elements in your page, you will see that _all field is hardcoded :
"global": true,
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"regexp": {
"_all": {
"value": "category: /pattern/"
> https://github.com/elastic/kibana/issues/631
Try this:
(took\":[1-9][0-9][0-9],)
I'm not familiar with Elasticsearch or Kibana, but your query may end up looking like this:
"regexp": {
"_all": {
"value": "category: /(took\":[1-9][0-9][0-9],)/"
}
}

Elasticsearch - behavior of regexp query against a non-analyzed field

What is the default behavior for a regexp query against a non-analyzed field? Also, is that the same answer when dealing with .raw fields?
After everything i've read, i understand the following.
1. RegExp queries will work on analyzed and non-analyzed fields.
2. A regexp query should work across the entire phrase rather than just matching on a single token in non-analyzed fields.
Here's the problem though. I can not actually get this to work. I've tried it across multiple fields.
The setup i'm working with is a stock elk install and i'm dumping pfsense and snort logs into it with a basic parser. I'm currently on Kibana 4.3 and ES 2.1
I did a query to look at the mapping for one of the fields and it indicates it is not_analyzed, yet the regex does not work across the entire field.
"description": {
"type": "string",
"norms": {
"enabled": false
},
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
What am i missing here?
if a field is non-analyzed, the field is only a single token.
It's same answer when dealing with .raw fields, at least in my work.
You can use groovy script:
matcher = (doc[fields.raw].value =~ /${pattern}/ );
if(matcher.matches()) {
matcher.group(matchname)}
you can pass pattern and matchname in params.
What's meaning of tried it across multiple fields.? If your situation is more complex, maybe you could make a native java plugin.
UPDATE
{
"script_fields" : {
"regexp_field" : {
"script" : "matcher = (doc[fieldname].value =~ /${pattern}/ );if(matcher.matches()) {matcher.group(matchname)}",
"params" : {
"pattern" : "your pattern",
"matchname" : "your match",
"fieldname" : "fields.raw"
}
}
}
}