I have an email address like example.regex#yahoo.com. Is there a regex expression that will match example.regex#yahoo.com, example.regex, example, regex? The expression should not match Yahoo.com, Yahoo or com
Have the following e-mail address:
denisa.example#yahoo.com and I want the following strings to match the query:
denisa.example
denisa
example
I already tried it with the following elastic-search analyzer query:
{
"settings": {
"analysis": {
"filter": {
"email": {
"type": "pattern_capture",
"preserve_original": true,
"patterns": [
"([^#]+)",
"(\\p{L}+)",
"(\\d+)",
"#(.+)"
]
}
},
"analyzer": {
"email": {
"tokenizer": "uax_url_email",
"filter": [
"email",
"lowercase",
"unique"
]
}
}
}
}
}
but it gives me the following results:
denisa.example
denisa
example
yahoo.com
yahoo
com
I found an answer :
"patterns" : [
"^(.*?)#",
"(\\w+(?=.*#))"].
Thanks!
You can do something like that:
function extract(email) {
const name = email.match(/^(.*?)#.*/)[1];
return [
name,
...name.split(".")
]
}
console.log(extract("example.regex#yahoo.com"));
If you check your browser console, you will print something like that:
(3) ["example.regex", "example", "regex"]
Related
I'm trying to run a regex query in Elastic search based on a field called _id, but I'm getting this error:
Can only use wildcard queries on keyword and text fields - not on
[_id] which is of type [_id]
I've tried regexp:
{
"query": {
"regexp": {
"_id": {
"value": "test-product-all-user_.*",
"flags" : "ALL",
"max_determinized_states": 10000,
"rewrite": "constant_score"
}
}
}
}
and wildcard:
{
"query": {
"wildcard": {
"_id": {
"value": "test-product-all-user_.*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
But both threw the same error.
This is the complete error just in case:
{ "error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "Can only use wildcard queries on keyword and text fields - not on [_id] which is of type [_id]",
"index_uuid": "Cg0zrr6dRZeHJ8Jmvh5HMg",
"index": "explore_segments_v3"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "explore_segments_v3",
"node": "-ecTRBmnS2OgjHrrq6GCOw",
"reason": {
"type": "query_shard_exception",
"reason": "Can only use wildcard queries on keyword and text fields - not on [_id] which is of type [_id]",
"index_uuid": "Cg0zrr6dRZeHJ8Jmvh5HMg",
"index": "explore_segments_v3"
}
}
] }, "status": 400 }
_id is a special kind of feild in Elasticsearch , It's not really an indexed field like other text fields, it's actually "generated" based on the UID of the document.
You can refer to this link for more information https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
As per the documentation it only supports limited type of queries (term, terms, match, query_string, simple_query_string), and if you want to do more advanced text search like wildcard or regexp you will need to index the ID into an actual text field on the document itself.
I am making elastic query using Q object and I have indexed documents, one of the documents contains "jbl speakers are great", but my query has "speaker" instead of speakers how can I find this document with query string.
I have tried match_phrase but it is unable to find this document and when I had tried query_string it threw an error saying "query_string does not support for some key". I have also tried wildcard but that is also not working with query like
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"prod_group": "06"
}
},
{
"match_phrase": {
"prod_group": "apparel"
}
},
{
"wildcard": {
"prod_cat_for_search": "+speaker*"
}
},
{
"range": {
"date": {
"gte": "2018-04-07"
}
}
}
]
}
}
}
Q('match_phrase', prod_cat_for_search='speaker')
I expect the output document containing speakers but
actual output is no document containing speakers
The type of search you are looking for can be achieved by using stemmer token filter at the time of indexing.
Lets see how it work using the example mapping as below:
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"filter": [
"lowercase",
"my_stemmer"
],
"tokenizer": "whitespace"
}
},
"filter": {
"my_stemmer": {
"type": "stemmer",
"name": "english"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"description": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
For the field description in above mapping we have used analyzer as my_analyzer. This analyzer will apply token filters lowercase and my_stemmer. The my_stemmer will apply english stemming on the input value.
For e.g. if we index a document as below:
{
"description": "JBL speakers build with perfection"
}
The tokens that will get indexed are:
jbl
speaker
build
with
perfect
Notice speakers is indexed as speaker and perfection as perfect.
Now if you search for speakers or speaker both will match. Similarly, if you search for perfect the above document will match.
Why speakers or perfection will match might be a question arising in your mind. The reason for this is that by default elastic search apply the same analyzer that was used while indexing at the time of searching as well. So if you search for perfection it will be actually searching for perfect and hence the match.
More on stemming.
I would like to ask if exists some documentation which describe how to work with Elasticseach pattern regex.
I need to write Pattern Capture Token Filter which filter only tokes start with specific word. For example input tokens stream should be like ("abcefgh", "abc123" , "aabbcc", "abc", "abdef") and my tokenizer will return only tokes abcefgh , abc123, abc because those tokens start with "abc".
Can someone help me how to achieve this use-case?
Thanks.
I suggest something like this:
"analysis": {
"analyzer": {
"my_trim_keyword_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim",
"generate_tokens",
"eliminate_tokens",
"remove_empty"
]
}
},
"filter": {
"eliminate_tokens": {
"pattern": "^(?!abc)\\w+$",
"type": "pattern_replace",
"replacement": ""
},
"generate_tokens": {
"type": "pattern_capture",
"preserve_original": 1,
"patterns": [
"(([a-z]+)(\\d*))"
]
},
"remove_empty": {
"type": "stop",
"stopwords": [""]
}
}
}
If your tokens are the result of a pattern_capture filter, you'd need to add after this filter the one called eliminate_tokens in my example which basically matches token that don't start with abc. Those that don't match are replaced by empty string ("replacement": "").
After this, to remove the empty tokens I added the remove_empty filter which is basically a stop filter where the stopword is "" (empty string).
I am trying to retrieve some company results using elasticsearch. I want to get companies that start with "A", then "B", etc. If I just do a pretty typical query with "prefix" like so
GET apple/company/_search
{
"query": {
"prefix": {
"name": "a"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
But this will return Acme as well as Lemur and Associates, so I need to distinguish between A at the beginning of the whole name versus just A at the beginning of a word.
It would seem like regular expressions would come to the rescue here, but elastic search just ignores whatever I try. In tests with other applications, ^[\S]a* should get you anything that starts with A that doesn't have a space in front of it. Elastic search returns 0 results with the following:
GET apple/company/_search
{
"query": {
"regexp": {
"name": "^[\S]a*"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
In FACT, the Sense UI for Elasticsearch will immediately alert you to a "Bad String Syntax Error". That's because even in a query elastic search wants some characters escaped. Nonetheless ^[\\S]a* doesn't work either.
Searching in Elasticsearch is both about the query itself, but also about the modelling of your data so it suits best the query to be used. One cannot simply index whatever and then try to struggle to come up with a query that does something.
The Elasticsearch way for your query is to have the following mapping for that field:
PUT /apple
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
}
},
"mappings": {
"company": {
"properties": {
"name": {
"type": "string",
"fields": {
"analyzed_lowercase": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
}
}
}
}
And to use this query:
GET /apple/company/_search
{
"query": {
"prefix": {
"name.analyzed_lowercase": {
"value": "a"
}
}
}
}
or
GET /apple/company/_search
{
"query": {
"query_string": {
"query": "name.analyzed_lowercase:A*"
}
}
}
Mongodb allows regex expression of pattern /pattern/ without using $regex expression.
http://docs.mongodb.org/manual/reference/operator/query/in/
How can i do it using morphia ?
If i give Field criteria with field operator as in and value of type "java.util.regex.Pattern" then the equivalent query generated in
$in:[$regex: 'given pattern'] which wont return expected results at all.
Expectation: $in :[ /pattern1 here/,/pattern2 here/]
Actual using 'Pattern' object : $in : [$regex:/pattern1 here/,$regex:/pattern 2 here/]
I'm not entirely sure what to make of your code examples, but here's a working Morphia code snippet:
Pattern regexp = Pattern.compile("^" + email + "$", Pattern.CASE_INSENSITIVE);
mongoDatastore.find(EmployeeEntity.class).filter("email", regexp).get();
Note that this is really slow. It can't use an index and will always require a full collection scan, so avoid it at all cost!
Update: I've added a specific code example. The $in is not required to search inside an array. Simply use /^I/ as you would in string:
> db.profile.find()
{
"_id": ObjectId("54f3ac3fa63f282f56de64bd"),
"tags": [
"India",
"Australia",
"Indonesia"
]
}
{
"_id": ObjectId("54f3ac4da63f282f56de64be"),
"tags": [
"Island",
"Antigua"
]
}
{
"_id": ObjectId("54f3ac5ca63f282f56de64bf"),
"tags": [
"Spain",
"Mexico"
]
}
{
"_id": ObjectId("54f3ac6da63f282f56de64c0"),
"tags": [
"Israel"
]
}
{
"_id": ObjectId("54f3ad17a63f282f56de64c1"),
"tags": [
"Germany",
"Indonesia"
]
}
{
"_id": ObjectId("54f3ad56a63f282f56de64c2"),
"tags": [
"ireland"
]
}
> db.profile.find({ tags: /^I/ })
{
"_id": ObjectId("54f3ac3fa63f282f56de64bd"),
"tags": [
"India",
"Australia",
"Indonesia"
]
}
{
"_id": ObjectId("54f3ac4da63f282f56de64be"),
"tags": [
"Island",
"Antigua"
]
}
{
"_id": ObjectId("54f3ac6da63f282f56de64c0"),
"tags": [
"Israel"
]
}
{
"_id": ObjectId("54f3ad17a63f282f56de64c1"),
"tags": [
"Germany",
"Indonesia"
]
}
Note: The position in the array makes no difference, but the search is case sensitive. Use /^I/i if this is not desired or Pattern.CASE_INSENSITIVE in Java.
Single RegEx Filter
use .filter(), .criteria(), or .field()
query.filter("email", Pattern.compile("reg.*exp"));
// or
query.criteria("email").contains("reg.*exp");
// or
query.field("email").contains("reg.*exp");
Morphia converts this into:
find({"email": { $regex: "reg.*exp" } })
Multiple RegEx Filters
query.or(
query.criteria("email").contains("reg.*exp"),
query.criteria("email").contains("reg.*exp.*2"),
query.criteria("email").contains("reg.*exp.*3")
);
Morphia converts this into:
find({"$or" : [
{"email": {"$regex": "reg.*exp"}},
{"email": {"$regex": "reg.*exp.*2"}},
{"email": {"$regex": "reg.*exp.*3"}}
]
})
Unfortunately,
You cannot use $regex operator expressions inside an $in.
MongoDB Manual 3.4
Otherwise, we could do:
Pattern[] patterns = new Pattern[] {
Pattern.compile("reg.*exp"),
Pattern.compile("reg.*exp.*2"),
Pattern.compile("reg.*exp.*3"),
};
query.field().in(patterns);
hopefully, one day morphia will support that :)