Lucene regex v4 - regex

I am trying to query on Kibana version 7.9.1 for a uuidv4. I disabled the KQL an now it looks like it is using lucene.
Example of a uuid v4:
2334e133-37a6-4039-8acd-b0a561b961b2
Now if I input :
/[0-9a-fA-F]{8}/
in the search bar I get hits, but as soon as I try to escape the hyphen like
/[0-9a-fA-F]{8}\-/
nothing shows up. I would like to use the full regular expression:
[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}
But I can't because of the hyphens.
Is there any other way to escape that pesky hyphen?
I am using elastic search 7.9.1 by the way

I'm not sure why that regex above won't work for you, but this was the best I could come up with given the context: ^[0-9a-fA-F]{8}[^\s\d\w!##$%^&*()_+=\\\][{}|';:"\/.,<>?][0-9a-fA-F]{4}[^\s\d\w!##$%^&*()_+=\\\][{}|';:"\/.,<>?][0-9a-fA-F]{4}[^\s\d\w!##$%^&*()_+=\\\][{}|';:"\/.,<>?][0-9a-fA-F]{4}[^\s\d\w!##$%^&*()_+=\\\][{}|';:"\/.,<>?][0-9a-fA-F]{12}$
It basically is just replacing your "-" with a character not in range "[^...]" that I filled with almost everything except - and added a start character "^" and end character "$"
Again, not sure if lucene is just not using certain parts of regex, but try not escaping the -'s I know some programs will automatically escape symbols for you when using regex.

I ended up using the following regex on lucene in the kibana discover option:
/[0-9a-fA-F]{8}/ AND /[0-9a-fA-F]{4}/ AND /[0-9a-fA-F]{12}/
Not pretty, but it works.

Related

Google Tag Manager - Regex match

I want to check if a specific string is included in a GTM variable. The value of this variable is a first-party-cookie value decoded via URI looking like this:
"\"prodirversion\":5,\"panellanguage\":\"de\",\"preferences\":false,"\"marketing\":true,\"necessary\":true,\"statistics\":false,\"social_"
I now want to check if the following string is included.
marketing":true
I created another variable with a regex table and tried different regex expressions but nothing seems to work. It works on online regex tester but not in Google Tag Manager.
My guess would be the following but it doesn't work.
marketing\\":true
or
marketing.{3}true
or
marketing\\.{2}true
GTM variable
Some Regex engines will have an error on not escaping " char in marketing\\":true
Try escaping it like this: marketing\\\":true, and it should match.
Update:
marketing":true seems to be working in GTM
from that, we can conclude that escaping character \ in input string is for show only in GTM case, and should be ignored when regex testing/debugging.

This Regex is not working only in Solr

This Regex is working perfectly in plain C# console application. Based on this we have started using SolrNet. Trying to query a Solr instance for a field by using the same regex, throwing exceptions as shown below
java.lang.IllegalArgumentException: expected ']' at position 70 at org.apache.lucene.util.automaton.RegExp.parseCharClassExp(RegExp.java:1087)
You are using Lucene regex engine that is different from the .NET regex engine.
A hyphen is a range operator when it is unescaped even at the end of the character class in a Lucene pattern. So, either escape the hyphen or move to the character class start, i.e. [a-zA-Z'-] => [-a-zA-Z'] and [^a-zA-Z'-] => [^-a-zA-Z'].
It does not look like Lucene regex supports non-capturing groups, so remove all ?: from the pattern.
So, it will look like
([-a-zA-Z']+[^-a-zA-Z']+){0,5}the([^-a-zA-Z']+[-a-zA-Z']+){0,5}([-a-zA-Z']+[^-a-zA-Z']+){0,5}the([^-a-zA-Z']+[-a-zA-Z']+){0,5}
As per your comment, your use case seems best suited to use a phrase query, did you try it?
a query like "website stackoverflow.com is"~5 could work and would be more performant. If the order it's important, you could use two queries ("website stackoverflow"~5 AND "stackoverflow.com is"~5) and use a custom scorer to remove the ones that are not in order. It will be much more performant.

Postgres invalid regular expression: invalid character range

I'm using the following line in a postgres function:
regexp_replace(input, '[^a-z0-9\-_]+', sep, 'gi');
But I'm getting ERROR: invalid regular expression: invalid character range when I try to use it. The regex works fine in Ruby, is there a reason it'd be different in postgres?
Some regexp parsers will work with a dash (-) in the middle, if after a range like you have it, but others won't. I suspect the postgres regexp parser is in the later class. The canonical way to have the dash in a regexp is to start with it, i.e. change the regexp to '[^-a-z0-9_]+' which might get it past the parser. Some regexp parsers, however, can be really fussy and not accept that, either.
I don't have a postgres to test with, but I expect they'll accept the regexp above and deal correctly. Otherwise you have to find the regexp portion of their manual and understand what it says about this.
I had the same problem
using
\-
instead of only
-
worked to me
For me it worked to move the dash (-) to the end of the list
replaced [A-Za-z0-9-_.+=] with [A-Za-z0-9_.+=-] seems to work
[^[:digit:]\-.]
The above code will work.

Regular expression: find abc.com except xyz.abc.com or #abc.com

In Eclipse I want to find a string, and using the normal search results in hundreds of irrelevant results. So I'm trying to use regular expressions, but they don't give me the proper results up til now.
This is what I need: find "abc.com", but not "xyz.abc.com" or "#abc.com". To make it clear, it should return www.abc.com.
I've tried the following regex but I'm not sure if this is how it should be:
[^#xyz\.]abc.com
Using a negative lookbehind should suit your needs:
(?<!xyz[.]|#)abc[.]com
Every "abc.com" that is not preceded by "xyz." nor by "#".

Removing everything between a tag (including the tag itself) using Regex / Eclipse

I'm fairly new to figuring out how Regex works, but this one is just frustrating.
I have a massive XML document with a lot of <description>blahblahblah</description> tags. I want to basically remove any and all instances of <description></description>.
I'm using Eclipse and have tried a few examples of Regex I've found online, but nothing works.
<description>(.*?)</description>
Shouldn't that work?
EDIT:
Here is the actual code.
<description><![CDATA[<center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"><th>ID</th><td>308</td></tr></table></center>]]></description>
I'm not familiar with Eclipse, but I would expect its regex search facility to use Java's built-in regex flavor. You probably just need to check a box labeled "DOTALL" or "single-line" or something similar, or you can add the corresponding inline modifier to the regex:
(?s)<description>(.*?)</description>
That will allow the . to match newlines, which it doesn't by default.
EDIT: This is assuming there are newlines within the <description> element, which is the only reason I can think of why your regex wouldn't work. I'm also assuming you really are doing a regex search; is that automatic in Eclipse, or do you have to choose between regex and literal searching?