Compare column value against list of regex values stored in another table and update accordingly

Compare column value against list of regex values stored in another table and update accordingly - regex

I am new to Oracle programming.
I want to check the "msg" value of "Table1" against the "regex" values from "Table2".
If the regular expression matches as such, I want to update the respective "regex_id" in "Table1".
Usual query: SELECT 'match found' FROM DUAL WHERE REGEXP_LIKE('s 27', '^(s27|s 27)')
Table1
MSG REG_EXID
Ss27 ?
s27 ?
s28 ?
s29 ?
Table2
REGEX REG_EXID RELEVANCE
^(s27|s 27) 1 10
^(s29|s 29) 2 2
^(m28|m 28) 3 2
^(s27|s 27) 4 100

Taking the newly added "relevance" into account, with Oracle 11g you could try along
UPDATE Table1 T1
SET T1.reg_exID =
(SELECT DISTINCT
MAX(reg_exID) KEEP (DENSE_RANK FIRST ORDER BY relevance DESC) OVER (PARTITION BY regex)
FROM Table2
WHERE REGEXP_LIKE(T1.msg, regex)
)
;
See SQL Fiddle.

You could work along
UPDATE Table1
SET reg_exID = (SELECT reg_exID FROM Table2 WHERE REGEXP_LIKE(Table1.msg, regex));
Please keep in mind:
None of your current sample records will be updated as REGEX are case sensitive.
The above UPDATE will fail, if more than a single REGEX does match.
You could rewrite the current REGEX expressions along "^m ?28".
See it in action: SQL Fiddle (With some data added to actually show the effect.)
Please comment if and as clarification/adjustment is required.

Related

MYSQL get substring

I'm trying to get substring dynamically and group by it. So if my uri column contains records like: /uri1/uri2 and /somelongword/someotherlongword I would like to get everything up to second delimiter, namely up to second / and count it. I'm using this query but obviously it is cutting string statically (6 letters after the first one).
SELECT substr(uri, 1, 6) as URI,
COUNT(*) as COUNTER
FROM staging
GROUP BY substr(uri, 1, 6)
ORDER BY COUNTER DESC
How can I achieve that?

You can use combination of SUBSTRING() and POSITION()
schema:
CREATE TABLE Table1
(`uri` varchar(10))
;
INSERT INTO Table1
(`uri`)
VALUES
('some/text'),
('some/text1'),
('some/text2'),
('aa/bb'),
('aa/cc'),
('bb/cc')
;
query
SELECT
SUBSTRING(uri,1,POSITION('/' IN uri)-1),
COUNT(*)
FROM Table1
GROUP BY SUBSTRING(uri,1,POSITION('/' IN uri)-1);
http://sqlfiddle.com/#!9/293dd3/3/0
edit: here I found amazon athena documentation: https://docs.aws.amazon.com/athena/latest/ug/presto-functions.html and here is the string function documentation: https://prestodb.io/docs/0.217/functions/string.html
my answer above still stands, but you might need to change SUBSTRING to SUBSTR
edit 2: it seems there's a special function to achieve this in amazon athena called SPLIT_PART()
query:
SELECT SPLIT_PART(uri, '/', 1), COUNT(*) FROM tbl GROUP BY SPLIT_PART(uri, '/', 1)
from docs:
split_part(string, delimiter, index) → varchar
Splits string on delimiter and returns the field index. Field indexes start with 1. If the index is larger than than the number of fields, then null is returned.

Extracting String Portions in SQL using Regular expressions

Hi All,
I have a query related to Regular expressions in SQL.
I have a case where a portion of string has to be extracted from a column. The portion of that column will be prefixed with my column A. Please see the screenshot for the sample data. I have also added the output expected in a separate column (highlighted in green).
Scenarios:
Now if a column value has more than 1 unique number then that has to be shown up with Null
Eg: To verify CAN06010025, CAN06010026 & CAN06010030 after the approval.
In the above string I have more than 1 number(bold portion)
and this case should be ignored (meaning it has to give me Null Value).
If there is only one number and if it is repetitive then I have to consider that case and extract the portion of String..
Eg: Project USA12: Id USA12S001: Contact required -USA12S001- form to be updated
In this example, the portion I wanted to extract is repetitive and I am looking to extract the highlighted portion alone.
The same applies to the other cases as well.
I tried with the below sql. The challenge is my Col A can also be present in Col B (Line 2 in screenshot) and this code is considering my Col A portion when I count with REGEXP_COUNT function and is giving me the value as Null. My expectation is to extract that USA12S001 portion from the column.
Could you please help in achieving this where the above two conditions satisfies.
SQL:
SELECT
ColA,
ColB,
case when REGEXP_COUNT(ColB,ColA) >2 THEN NULL
ELSE REPLACE(REPLACE(concat(regexp_substr(ColB,ColA||'([[:alnum:]]+\.?)'),
nvl(regexp_substr(ColB,ColA||'(\-[[:digit:]]+)'),
regexp_substr(ColB,ColA||'([[:space:]]\-[[:space:]][[:digit:]]+)'))),
' ',''),'.','')
END AS Result
FROM
table
Test Data:
Col A
CAN06
USA12
USA27
HUN04
CAN05
USA24
CAN06
Col B
to verify CAN06010025, CAN06010026 & CAN06010030 after the approval
Project USA12: Id USA12S001: Contact required -USA12S001- form to be updated
Project USA27: Id: USA27S001: Prod
To review id HUN04S002-HUN04S004 after the due date.
ID: CAN05S005 with the details as CAN05S005 are completed.
Project USA24: Id: USA24S009: Data Issue
"Project: Subject CAN06S009: V2 & V3- Id CAN06S010: V1"

If the REGEXP_COUNT is the only issue, then the answer is simple: change
case when REGEXP_COUNT(ColB,ColA) >2
to:
case when REGEXP_COUNT(ColB,ColA || '[[:alnum:]]') >2

REGEX help needed in Oracle

How to get all the table names from the below Sql? My sql returns only the last table name.
with t as
(select 'select col1,
(select max(col3) from dd3) max_timestamp
from dd1,
dd2
where dd1.col1 = dd2.col1
and dd1.col1 in(select col1 from dd4)' sql_text from dual)
select regexp_substr(regexp_substr(upper(sql_text), '\sFROM\s*(\w|\.|_)*'), '(\w|_|\.)+', 1,2)
from t
Thanks,
DD.

This is a more of a regex question than an Oracle question.
If you can run the sql through REPLACE(REPLACE(sql,CHR(13),' '),CHR(10),NULL) to replace all newlines with a space, so that the query fits on a single line, here is regex that will return all the tables in group 1 (for the ones after FROM) and group 3 for subsequent items in a list:
/FROM ([A-Z0-9$#_]+)(,[\s]*([A-Z0-9$#_]+))*/gi
Having multiple groups is not ideal, so I would look at the full match instead, see https://regex101.com/r/OZUalH/1/ for an example (see full match on the right, where every match has from followed by one or more tables).
But let me warn you this is not going to be robust, as these valid FROM clause expressions are not handled:
"my_table"
MY_TABLE AS A
MY_TABLE AS "a"
etc...
If it were me, I would write a function to run the query through explain plan (execute immediate 'explain plan for ...') and extract the tables from the plan tables (or possibly using SYS.DBMS_XPLAN)

PL/SQL regexp_like filters

I want to delete some tables and wrote this procedure:
set serveroutput on
declare
type namearray is table of varchar2(50);
total integer;
name namearray;
begin
--select statement here ..., please see below
total :=name.count;
dbms_output_line(total);
for i in 1 .. total loop
dbms_output.put_line(name(i));
-- execute immediate 'drop table ' || name(i) || ' purge';
End loop;
end;
/
The idea is to drop all tables with table name having pattern like this:
ERROR_REPORT[2 digit][3 Capital characters][10 digits]
example: ERROR_REPORT16MAY2014122748
However, I am not able to come up with the correct regexp. Below are my select statements and results:
select table_name bulk collect into name from user_tables where regexp_like(table_name, '^ERROR_REPORT[0-9{2}A-Z{3}0-9{10}]');
The results included all the table names I needed plus ERROR_REPORT311AUG20111111111. This should not be showing up in the result.
The follow select statement showed the same result, which meant the A-Z{3} had no effect on the regexp.
select table_name bulk collect into name from user_tables where regexp_like(table_name, '^ERROR_REPORT[0-9{2}0-9{10}]');
My question is what would be the correct regexp, and what's wrong with mine?
Thanks,
Alex

Correct regex is
'^ERROR_REPORT[0-9]{2}[A-Z]{3}[0-9]{10}'

I think this regex should work:
^ERROR_REPORT[0-9]{2}[A-Z]{3}[0-9]{10}
However, please check the regex101 link. I've assumed that you need 2 digits after ERROR_REPORT but your example name shows 3.

REGEXP Help - Oracle Pass Through Query

Currently working in Microsoft Access, writing a PTQ to an Oracle Data Warehouse.
One of the fields is a description field which holds alphanumeric strings. Sometimes all characters, sometimes the inclusion of a 9 digit number.
What I want to be able to do is if there is a 9 digit number, to select it from that description field and create a new field with it.
SELECT description
REGEXP_SUBSTR( * here goes the reg exp * ) "REGEXPR_SUBSTR"
FROM myTable
REGEXP_SUBSTR

select * from
(
SELECT REGEXP_SUBSTR("desc",'\d{9}') REGEXPR_SUBSTR FROM temp1
)
where REGEXPR_SUBSTR is not null;
Thil will work perfectly. It rejects nulls and only accept 9 digits. Last answer I was writing in hurry. Mi scuzi :)

I do not have SQLDeveloper or sqlplus to check it but let me try:
SELECT REGEXP_SUBSTR(descritpion,'\d{0,9}') "REGEXPR_SUBSTR" FROM myTable

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Compare column value against list of regex values stored in another table and update accordingly - regex

Taking the newly added "relevance" into account, with Oracle 11g you could try along UPDATE Table1 T1 SET T1.reg_exID = (SELECT DISTINCT MAX(reg_exID) KEEP (DENSE_RANK FIRST ORDER BY relevance DESC) OVER (PARTITION BY regex) FROM Table2 WHERE REGEXP_LIKE(T1.msg, regex) ) ; See SQL Fiddle.

Related

MYSQL get substring

Extracting String Portions in SQL using Regular expressions

REGEX help needed in Oracle

PL/SQL regexp_like filters

REGEXP Help - Oracle Pass Through Query

Categories

Resources