HiveQL - Rlike to a regexp in a field? - regex

Can you use rlike to join a table using regular expressions contained in a field?
i.e.
Select a., b.
from Table a
inner join Table2 b
on a.Field rlike b.Field2
i.e. Table 2 data:
Field1 Field2
David ^D(a|o)vid
Test a ^Test

Just worked out that I can do this.
select a., b.
from Table1 a, Table2 b
where b.Field1 rlike a.Field2

Related

SQL RLIKE function Postcode Search

I am trying to understand why the following query pulls through postcodes that I wouldn't expect.
SQL
Select distinct Postcode from tableA where like 'NE1%';
Shows 2 postcodes, all beginning with NE1
I've tried :-
Select distinct Postcode from tableA where rlike '^NE[0-1]%'
Shows many postcodes, including the 2 from above, such as NE27 0EZ - I'm assuming because it has a zero in the 2nd part of the postcode, but no idea why NE2 2NE appears !
My goal is to filter all postcodes that begin with an N (not NE) BUT only have a numeric as the next character - SQL only, not python or scala, as this filter forms 1 of many postcode filters (a large OR clause)
I would have thought for all postcodes beginning with a N that had a numeric as the next character would have worked :-
Select distinct Postcode from tableA where rlike 'N[0-9] %' or 'N[0-9][0-9] %'
select distinct 'rlike' as Func , postcode from npex.npex where postcode rlike '^NE[0-1]*'
union
select distinct 'like', postcode from npex.npex where postcode like 'NE1%'
order by 1;
RESULTS
Func postcode
like NE1 3BB
like NE12 1AB
rlike NE27 0EZ
rlike NE6 2UT
rlike NE27 0LT
rlike NE12 1AB
rlike NE2 2NE
rlike NE3 4DT
rlike NE1 3BB
* is not needed, otherwise you would be matching 0 or more of zeroes or ones.
select distinct postcode from npex.npex where postcode rlike '^NE[0-1]'
If you want to get those beginning with an N followed by a numeric, you can use
select distinct postcode from npex.npex where postcode rlike '^N[0-9]'

Extract text between parenthesis from a postgres table without creating additional column

I am trying to extract text between parenthesis from a column in postgres table. I am using following command. It is creating an additional blank column.
SELECT *, SUBSTRING (col2, '\[(.+)\]') FROM table
My table looks like this:
col1 col2
1 mut(MI_0118)
2 mut(MI_0119)
3 mut(MI_0120)
My desired output is:
col1 col2
1 MI_0118
2 MI_0119
3 MI_0120
How can I extract the text without creating an additional column.
Thanks
Your regex is wrong, that's why you get an empty column. You don't want square brackets, but parentheses around the search string
select col1, substring(col2, '\((.+)\)')
from input
Online example
The * in the SELECT statement is including all columns. Then you are adding another unnamed column. If you do:
SELECT col1, SUBSTRING (col2, '\[(.+)\]') AS col2 FROM table
It will be closer to what you want.

Extract rows from a table with regular expression hive sql

Please check the link for the result and table info. I need to query rows
with value '343' in Col B with a regular expression . All columns are strings . Also please be kind enough to point any good learning materials in how to write good REGEX in Hive . Thank you
For Hive use this:
select * from tablename where B rlike '343';
Checking it works:
hive> select '123435' rlike '343';
OK
_c0
true
Negative test:
hive> select '12345' rlike '343';
OK
_c0
false
Time taken: 1.675 seconds, Fetched: 1 row(s)
Hive uses Java flavor regex. You can find good reference and practice here: https://regexr.com/ and of course regex101
this will work:
select * from tablename where regexp_like(B,'(.*)(343)(.*)');
hive equivalent is :
select * from tablename where rlike(B,'(.*)(343)(.*)');

How to update a columns having a specific string patter/sequence

There is a column in my table having values of a pattern like 'A=xxx^B=xxx^C=xxx^D=xxx^' i need to update all the columns having this pattern to a pattern like 'C=xxx^D=xxx^', where x is a number.
Would something like this help? REGEXP_LIKE returns rows which satisfy the condition, while an ordinary SUBSTR returns the desired result.
SQL> with test (col) as
2 (select 'A=123^B=123^C=123^D=123^' from dual union
3 select 'A=123^B=456^C=789^D=987^' from dual union
4 select 'A=333^C=333^D=333^' from dual union
5 select 'C=987^D=987^' from dual union
6 select 'B=876^' from dual union
7 select 'A=123^B=123^C=123^D=123^E=123^' from dual
8 )
9 select col,
10 substr(col, instr(col, 'C')) result
11 from test
12 where regexp_like(col, '^A=\d+{3}\^B=\d+{3}\^C=\d+{3}\^D=\d+{3}\^$');
COL RESULT
------------------------------ ------------------------------
A=123^B=123^C=123^D=123^ C=123^D=123^
A=123^B=456^C=789^D=987^ C=789^D=987^
SQL>
I managed to come up with a solution since i'm looking for a pattern which starts from 'A=' i used REGEXP_LIKE to find that particular pattern. Then i used SUBSTR to extract the value from the string which should start from the 2nd '^' character.
Update MYTABLE t set t.key = SUBSTR(t.key,INSTR(t.key,'^',1,2)+1) WHERE REGEXP_LIKE(t.key_ref,'^A=') and t.dno = 'xxxxx';

How create a regular expression to query Oracle?

I have table in Oracle DB. For example tbl1. And 1 column. For Example col1. My col1 - column text type. I need select all rows where text is 0 and It occurs once in the text. And if text It contains >1 digits I do not need it. For example i need this rows:
0
text0
0text
text 0 text
text0 text
and I do not need this rows
only text
0 0
00
10
3243455
0text 1
I think I need regex but I do not know how to use them.
I write
select * from tbl1 t where regexp_like(t.col1,'[0]{1}')
but i get only rows where contains 0
[^0-9]*[0-9][^0-9]*
This means: any non-digit any number of times, then a digit exactly once, and then any non-digit any number of times.
You might need to add ^ and $ to force it to match the entire string:
^[^0-9]*[0-9][^0-9]*$
Based on the answer by Ilya Kogan, you could get the rows you want with the following regular expression:
WITH tbl1 AS (SELECT '0' col1 FROM dual
UNION
SELECT 'text0' col1 FROM dual
UNION
SELECT '0text' col1 FROM dual
UNION
SELECT 'text 0 text' col1 FROM dual
UNION
SELECT 'text0 text ' col1 FROM dual
UNION
SELECT 'only text' col1 FROM dual
UNION
SELECT '0 0' col1 FROM dual
UNION
SELECT '00' col1 FROM dual
UNION
SELECT '10' col1 FROM dual
UNION
SELECT '3243455' col1 FROM dual
UNION
SELECT '0text 1' col1 FROM dual)
SELECT COL1, REGEXP_SUBSTR(col1,'\A\D*0\D*\Z')
FROM tbl1
WHERE REGEXP_LIKE(col1,'\A\D*0\D*\Z')
Where:
\A is the beginning of the line.
\D is a non-digit character.
0 is the character for the number 0.
\Z is the end of the line.