CASE WHEN - LIKE - REGEXP in Hadoop Hive - regex

I want to write a query in a hive Table using CASE WHEN, LIKE and a regular expression. I have used regexp and rlike, but I do not get the desired results. My attempts so far are the following
select distinct ending from
(select date, ending, name, count(distinct id)
from (select CONCAT_WS("/",year,month,day,hour) as date, id, name,
case when type = 'TRAN' then 'tran'
when events regexp '%[:]no_reply[:]%[^o][^n][:]incomplete[:]%' and type rlike '%HUP' then 'con'
when events not regexp '%[:]no_reply[:]%[^o][^n][:]incomplete[:]%' and type rlike '%HUP' then 'aban'
else 'other'
end as ending
from data_struct1) tmp
group by date, ending, name) tmp2;
and also
select distinct ending from
(select date, ending, name, count(distinct id)
from (select CONCAT_WS("/",year,month,day,hour) as date, id, name,
case when type = 'TRAN' then 'tran'
when events rlike '%[:]no_reply[:]%[^o][^n][:]incomplete[:]%' and type rlike '%HUP' then 'con'
when events not rlike '%[:]no_reply[:]%[^o][^n][:]incomplete[:]%' and type rlike '%HUP' then 'aban'
else 'other'
end as ending
from data_struct1) tmp
group by date, ending, name) tmp2;
Both queries return incorrect results (not bad syntax, just not the correct results).

There are a lot of docs on regex quantifiers, for example this one: https://learn.microsoft.com/en-us/dotnet/standard/base-types/quantifiers-in-regular-expressions
select 'opencase_2,initial_state:inquiry,inquiry:no_reply:initial_state:incomplete::,inquiry:reask:secondary_state:complete::' regexp 'no_reply:[^:]+:incomplete';
OK
true
Also this is wrong: rlike '%HUP'. It should be like this '.*HUP$' (in the end of the string) or simply 'HUP' if it does not matter where the HUP is located: in the middle or in the end or in the beginning of the string
rlike and regexp in your query work the same, better use the same operator: regexp or rlike only. These two are synonyms.
Test: https://regex101.com/r/ksG67v/1

Related

REGEXP_REPLACE for alpha to NULL

While converting Oracle to Postgresql I came to know the following Oracle query need to be converted in Postgres.
Oracle Query: Find pattern and replace with null
select regexp_replace('1', '[^0-9]', null) from dual;
select regexp_replace('a', '[^0-9]', null) from dual;
select regexp_replace('1a1', '[^0-9]', null) from dual;
My try:
As per the postgres document we need to use REGEXP_REPLACE with [[:alpha:]] pattern.
But the statement is replacing with empty string if match found. I'm looking for null instead.
PostgreSQL Query:
select REGEXP_REPLACE('1','[[:alpha:]]','','g') --Correct
select REGEXP_REPLACE('a','[[:alpha:]]','','g') --Wrong: output should be NULL
select REGEXP_REPLACE('1a1','[[:alpha:]]','','g') --Correct
select REGEXP_REPLACE(' ','[[:alpha:]]','','g') --Wrong: output should be NULL
Definitely we can use case statement like following but I want the solution in single line without using case condition.
SELECT case when REGEXP_REPLACE('1a','[[:alpha:]]','','g') = ''
then
null
else
REGEXP_REPLACE('1a','[[:alpha:]]','','g')
end;

Oracle regex and replace

I have varchar field in the database that contains text. I need to replace every occurrence of a any 2 letter + 8 digits string to a link, such as VA12345678 will return /cs/page.asp?id=VA12345678
I have a regex that replaces the string but how can I replace it with a string where part of it is the string itself?
SELECT REGEXP_REPLACE ('test PI20099742', '[A-Z]{2}[0-9]{8}$', 'link to replace with')
FROM dual;
I can have more than one of these strings in one varchar field and ideally I would like to have them replaced in one statement instead of a loop.
As mathguy had said, you can use backreferences for your use case. Try a query like this one.
SELECT REGEXP_REPLACE ('test PI20099742', '([A-Z]{2}[0-9]{8})', '/cs/page.asp?id=\1')
FROM DUAL;
For such cases, you may want to keep the "text to add" somewhere at the top of the query, so that if you ever need to change it, you don't have to hunt for it.
You can do that with a with clause, as shown below. I also put some input data for testing in the with clause, but you should remove that and reference your actual table in your query.
I used the [:alpha:] character class, to match all letters - upper or lower case, accented or not, etc. [A-Z] will work until it doesn't.
with
text_to_add (link) as (
select '/cs/page.asp?id=' from dual
)
, sample_strings (str) as (
select 'test VA12398403 and PI83048203 to PT3904' from dual
)
select regexp_replace(str, '([[:alpha:]]{2}\d{8})', link || '\1')
as str_with_links
from sample_strings cross join text_to_add
;
STR_WITH_LINKS
------------------------------------------------------------------------
test /cs/page.asp?id=VA12398403 and /cs/page.asp?id=PI83048203 to PT3904

RegEx SQL Select not matched

I have this regex
(?i)(sql.*[\s\S]select.*[\s\S]from[\s\S]*?\;)
This one is matched
SQL SELECT Distinct Field1,Field2
FROM Table1
;
But this one is not matched
SQL SELECT Distinct
Field,
Field2
FROM Table1
;
And this one also not:
SQL
SELECT Field,Field2
FROM Table1;
Why does this happen?
I changed my regex to
(?im)^sql[\s\S]*?^;$
and now the first and the second one are matched, but not the third one.
https://regex101.com/r/qLUbBh/3
(?im)^sql[\s\S]*?;$
This works.

Regex non-capturing parenthesis issue

I have a database query which looks like this
select * from students join (select * from teachers) join (select * from workers
I had a requirement to tokenize this string based on 'select'.
I am trying regex (select)(.*?)((?:select)|$), ut it is matching only 2 times.
Request some pointers on how to achieve this.
I need the 3 output tokens as below
select * from students join (
select * from teachers) join (
select * from workers
I think this regex will work:
select.*?(?=select|$)
The regex matches the word select, then any text (not including new lines) up until right before the next select or the end of the string.
Demonstration here: http://regex101.com/r/sR3gV1
If you are trying to parse the select queries from the string then you can use this regex. Assuming you are not doing select from multiple tables(i.e. not doing select * from x,y,z)
(select.*?from\\s+\\w+)

Regular Expressions _# at end of string

I am using the REGEXP_LIKE function in Oracle 10g to find values in a column with a suffix of _#(like _1, _2 etc). I can find _# in any part of the value with the query below but can I return only values with _# at the end ?
SELECT * FROM Table WHERE REGEXP_LIKE (COLUMN,'_[[:digit:]]')
Sure. Use...
SELECT * FROM Table WHERE REGEXP_LIKE (COLUMN,'_[[:digit:]]$')
The $ character matches "the end of the string."
No need to use reg exps.
select * from table where substr(column,-2) between '_0' and '_9';