I'm trying to do full-text search in MongoDB with $searchBeta (aggregation) and I'm using the 'regex' operator to do so. Here's the portion of the $searchBeta I have that isn't working how I expecting it would:
$searchBeta: {
regex: {
query: '\blightn', // '\b' is the word boundary metacharacter
path: ["name", "set_name"],
allowAnalyzedField: true
}
}
Here's an example of two documents that I'm expecting to get matched by the expression:
{
"name": "Lightning Bolt"
"set_name": "Masters 25"
},
{
"name": "Chain Lightning",
"set_name": "Battlebond"
}
What I actually get:
[] //empty array
If I use an expression like:
$searchBeta: {
regex: {
query: '[a-zA-Z]'
path: ["name", "set_name"],
allowAnalyzedField: true
}
}
then I get results back.
I can't get any expression that has regex metacharacters and/or options in it to work, so I'm pretty sure I'm just entering it wrong in my query string. The $searchBeta regex documentation doesn't really cover how to format metacharacters into your query string. Also, the $searchBeta regex operator is different from $regex because it doesn't require slashes (i.e. "/your expression/" ). Really pulling my hair out on something so simple that I can't figure out.
$searchBeta uses Lucene for regular expressions, which is not Perl Compatible (PCRE) and doesn't support \b. You can read about the Lucene regex syntax here and also Elastic's docs on it are also helpful.
Here is a similar question for ElasticSearch and includes some workarounds.
I have a link like http://drive.google.com and I want to match "google" out of the link.
I have:
query: {
bool : {
must: {
match: { text: 'google'}
}
}
}
But this only matches if the whole text is 'google' (case insensitive, so it also matches Google or GooGlE etc). How do I match for the 'google' inside of another string?
The point is that the ElasticSearch regex you are using requires a full string match:
Lucene’s patterns are always anchored. The pattern provided must match the entire string.
Thus, to match any character (but a newline), you can use .* pattern:
match: { text: '.*google.*'}
^^ ^^
In ES6+, use regexp insted of match:
"query": {
"regexp": { "text": ".*google.*"}
}
One more variation is for cases when your string can have newlines: match: { text: '(.|\n)*google(.|\n)*'}. This awful (.|\n)* is a must in ElasticSearch because this regex flavor does not allow any [\s\S] workarounds, nor any DOTALL/Singleline flags. "The Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators."
However, if you do not plan to match any complicated patterns and need no word boundary checking, regex search for a mere substring is better performed with a mere wildcard search:
{
"query": {
"wildcard": {
"text": {
"value": "*google*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
See Wildcard search for more details.
NOTE: The wildcard pattern also needs to match the whole input string, thus
google* finds all strings starting with google
*google* finds all strings containing google
*google finds all strings ending with google
Also, bear in mind the only pair of special characters in wildcard patterns:
?, which matches any single character
*, which can match zero or more characters, including an empty one
use wildcard query:
'{"query":{ "wildcard": { "text.keyword" : "*google*" }}}'
For both partial and full text matching ,the following worked
"query" : {
"query_string" : {
"query" : "*searchText*",
"fields" : [
"fieldName"
]
}
I can't find a breaking change disabling regular expressions in match, but match: { text: '.*google.*'} does not work on any of my Elasticsearch 6.2 clusters. Perhaps it is configurable?
Regexp works:
"query": {
"regexp": { "text": ".*google.*"}
}
For partial matching you can either use prefix or match_phrase_prefix.
For a more generic solution you can look into using a different analyzer or defining your own. I am assuming you are using the standard analyzer which would split http://drive.google.com into the tokens "http" and "drive.google.com". This is why the search for just google isn't working because it is trying to compare it to the full "drive.google.com".
If instead you indexed your documents using the simple analyzer it would split it up into "http", "drive", "google", and "com". This will allow you to match anyone of those terms on their own.
using node.js client
tag_name is the field name, value is the incoming search value.
const { body } = await elasticWrapper.client.search({
index: ElasticIndexs.Tags,
body: {
query: {
wildcard: {
tag_name: {
value: `*${value}*`,
boost: 1.0,
rewrite: 'constant_score',
},
},
},
},
});
You're looking for a wildcard search. According to the official documentation, it can be done as follows:
query_string: {
query: `*${keyword}*`,
fields: ["fieldOne", "fieldTwo"],
},
Wildcard searches can be run on individual terms, using ? to replace a single character, and * to replace zero or more characters: qu?ck bro*
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-wildcard
Be careful, though:
Be aware that wildcard queries can use an enormous amount of memory and perform very badly — just think how many terms need to be queried to match the query string "a* b* c*".
Allowing a wildcard at the beginning of a word (eg "*ing") is particularly heavy, because all terms in the index need to be examined, just in case they match. Leading wildcards can be disabled by setting allow_leading_wildcard to false.
I have a list of request coming in based on free text searches or codes.
I would like to eliminate the code-like requests, and only keep the natural language request.
Therefore I would need a query that can separate those terms.
Below is the query-json I already tried
{
"query": {
"regexp": {
"q": "[^\d\W]"}
}
}
}
error I get is "Bad String" for the following line "q": "[^\d\W]"}
Expected would be to improve the regex in order to be able to keep the relevant data
You may use
"regexp": {
"q": "[^0-9]+"}
}
The Lucene regex engine used in Kibana anchors all patterns by default, so [^0-9]+ will match any string, from start to end of which there are only characters other than digits.
Moreover, \d and \W and other shorthand character classes are not supported either.
I have JSON response from which i want to extract the "transaction id" value i.e (3159184) in this case and use it in my next sampler. Can somebody give me regular expression to extract the value for the same. I have looked for some solutions but it doesn't seem to work
{
"lock_release_date": "2021-04-03T16:16:59.7800000+00:00",
"party_id": "13623162",
"reservation_id": "reserve-1-81b70981-f766-4ca7-a423-1f66ecaa7f2b",
"reservation_line_items": [
{
"extended_properties": null,
"inventory_pool": "available",
"lead_type": "Flex",
"line_item_id": "1",
"market_id": 491759,
"market_key": "143278|CA|COBROKE|CITY|FULL",
"market_name": "143278",
"market_state_id": "CA",
"product_name": "Local Expert",
"product_size": "SOV30",
"product_type": "Postal Code",
"reserved_quantity": 0,
"transaction_id": 3159174
}
],
"reserved_by": "user1#abc.com"
}
Here's what i'm trying in Jmeter
If you really want the regular expression it would be something like:
"transaction_id"\s?:\s?(\d+)
Demo:
where:
\s? stands for an optional whitespace - this is why your expression doesn't work
\d+ stands for a number
See Regular Expressions chapter of JMeter User Manual for more details.
Be aware that parsing JSON using regular expressions is not the best idea, consider using JSON Extractor instead. It allows fetching "interesting" values from JSON using simple JsonPath queries which are easier to create/read and they are more robust and reliable. The relevant JSON Path query would be:
$.reservation_line_items[0].transaction_id
More information: API Testing With JMeter and the JSON Extractor
Use JSON Extractor for JSON response rather using Regular Expression extractor.
Use JSON Path Expressions as $..transaction_id
Results:
Simplest Regular Expression for extracting above is:
transaction_id": (.+)
Where:
() is used for creating capture group.
. (dot) matches any character except line breaks.
+ (plus) matches 1 or more of the preceding token.
(.+?) could be used to stop looking after first instance is found.
i.e. ? makes the preceding quantifier lazy, causing it to match as few characters as possible. By default, quantifiers are greedy, and will match as many characters as possible.
I'm trying to filter out 3 sensu check values for templating.
I'm using Elasticsearch as a datasource
Query: {"find": "terms","field":"check_name.keyword"}
Regex: /.*_error_100.*|.*_error_200.*|.*_error_300.*/
Is my regex wrong?
Thank you
Devon
Matching everything like .* is very slow as well as using lookaround
regular expressions.
To query some field by regex (exemplary query):
{
"query": {
"regexp":{
"somefield": "_error_[123]00"
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#regexp-syntax