How to write fuzzy multiple substring matching when using RLIKE in Hive

How to write fuzzy multiple substring matching when using RLIKE in Hive - regex

For example:
df.select('category').show()
+---------------------------+
| category|
+---------------------------+
| money,insurance|
| life, housework|
| game,FPS,network|
| game,fight,jump|
| hotel|
| trip,hotel|
| null|
I want to use RLIKE to write a regex expression to fuzzy match one of substrings list, ['money', 'life'].
-- This is an exact match
SELECT *
FROM tb_name
WHERE col_name RLIKE '(money|life)'
-- This is a fuzzy match
SELECT *
FROM tb_name
WHERE col_name RLIKE '*.(money|life)'
BUT there is error in ast tree in the fuzzy match code snippet.
06-11 16:59:17-fatal filter ast tree
(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TAB tb_name))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR "hdfs://XXXX/XX")) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (RLIKE (TOK_TABLE_OR_COL col_name ) '*.(money|life)')) (TOK_LIMIT 2000)))
06-11 16:59:17-fatal Filter feature: .TOK_TAB \S tdw_inter_db.*|.TOK_(CUBE|ROLLUP) .
So I can't see anything wrong with the fuzzy match code snippet.
So could anyone help me?
Thanks in advances.

'(?i)money|life' regexp will match strings containing any of money, life, case insensitive - (?i)

Related

Match a word in a list of words regex

I want the user to only be able to enter the values in the following regex:
^[AB | BC | MB | NB | NL | NS | NT | NU | ON |QC | PE | SK | YT]{2}$
My problem is that words like : PP AA QQ are accepted.
I am not sure how i can prevent that ? Thank you.
Site i use to verify the expression : https://regex101.com/

In most RegExp flavors, square brackets [] denotate character classes; that is, a set of individual tokens that can be matched in a specific position.
Because P is included in this character class (along with a quantifier of {2}) PP is matched.
Instead, you seem to want a group with alternatives; for that, you'd use parenthesis () (while also eliminating the whitespace, something it doesn't appear was intentional on your part):
^(AB|BC|MB|NB|NL|NS|NT|NU|ON|QC|PE|SK|YT){2}$
RegEx101
This matches things like ABBC, ABAB, NLBC, etc.

Regex Oracle not matching as expected

I need to match and replace string like VA123 - so two letters and 3 numbers, but this expression is not working as intended. Any idea where I am going off?
SELECT REGEXP_REPLACE ('test VA123', '^\[A-Z]{2}[0-9]{3}$', 'test')
FROM dual;
I want the output in this case to say test test

It you want white space (or start of a string) before the matched string then you can use:
SELECT REGEXP_REPLACE ('test VA123', '(^|\s)[A-Z]{2}[0-9]{3}$', '\1test')
AS replaced_value
FROM dual;
| REPLACED_VALUE |
| :------------- |
| test test |
db<>fiddle here

REGEX Replacing with exception

my first problem here is my nemesis regex.
I need a regex to replace every , with a "," from a text without replacing existing ,".
It looks like this:
Before:
abcd,efgh,ijkl,"","",mnop
After:
abcd","efgh","ijkl","","","mnop
I hope you can help me.

Solving a problem using regular expressions is nice but now you have two problems.
A simple solution that does not involve the usage of regular expressions to is do three simple string replacements: first replace , with "," then replace ","" with "," and in the end ""," with ",".
Let's see why this works:
| after 1st | after 2nd | after 3rd
original | replacement | replacement | replacement
----------+-------------+-------------+-------------
a,b | a","b | a","b | a","b
m",n | m"","n | m"","n | m","n
x,"y | x",""y | x","y | x","y
See it in action:
const input = 'abcd,efgh,ijkl,"","",mnop';
const output = input.replace(/,/g, '","').replace(/",""/g, '","').replace(/"","/g, '","');
console.log(output);
N.B. The code snippet above uses regular expressions because this is how JavaScript implements the "replace all" functionality. When the first argument of String.replace() is a string it replaces only its first occurrence.
I could use String.replaceAll() instead (it works with strings) but it is not widely supported by browsers yet.

Crudely, I think you are after something like:
(?:(?<!"),"|(?<=")",(?!")|(?<!"),(?!"))
Note: As mention by #WiktorStribiżew in the comments you could get rid of the outer non-capturing group: (?<!"),"|(?<=")",(?!")|(?<!"),(?!")
See the online Demo

Remove special characters from string on insert?

I have a field of type character varying. On insert I'd like to strip out special characters. In this particular case I'd like to strip out hyphens from a column of hyphenated strings, hyphen_field"123-456-789" from table_two and insert as "123456789" into non_hyphen_field in table_one. I'm starting with a statement of the following form:
INSERT INTO schema.table_one(var_one,var_two,non_hyphen_field)
SELECT var_one, var_two, hyphen_field
FROM schema.table_two;
What is the cleanest way to accomplish this?

On Postgres you can use replace function.
select replace('123-456-789', '-','');
| replace |
| :-------- |
| 123456789 |
dbfiddle here

Oracle SQL Regex not returning expected results

I am using a regex that works perfectly in Java/PHP/regex testers.
\d(?:[()\s#-]*\d){3,}
Examples: https://regex101.com/r/oH6jV0/1
However, trying to use the same regex in Oracle SQL is returning no results. Take for example:
select *
from
(select column_value str from table(sys.dbms_debug_vc2coll('123','1234','12345','12 135', '1', '12 3')))
where regexp_like(str, '\d(?:[()\s#-]*\d){3,}');
This returns no rows. Why does this act so differently? I even used a regex tester that does POSIX ERE, but that still works.

Oracle does not support non-capturing groups (?:). You will need to use a capturing group instead.
It also doesn't like the perl-style whitespace meta-character \s match inside a character class [] (it will match the characters \ and s instead of whitespace). You will need to use the POSIX expression [:space:] instead.
SQL Fiddle
Oracle 11g R2 Schema Setup:
Query 1:
select *
from (
select column_value str
from table(sys.dbms_debug_vc2coll('123','1234','12345','12 135', '1', '12 3'))
)
where regexp_like(str, '\d([()[:space:]#-]*\d){3,}')
Results:
| STR |
|--------|
| 1234 |
| 12345 |
| 12 135 |

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to write fuzzy multiple substring matching when using RLIKE in Hive - regex

'(?i)money|life' regexp will match strings containing any of money, life, case insensitive - (?i)

Related

Match a word in a list of words regex

Regex Oracle not matching as expected

REGEX Replacing with exception

Remove special characters from string on insert?

Oracle SQL Regex not returning expected results

Categories

Resources