REGEX help needed in Oracle - regex

How to get all the table names from the below Sql? My sql returns only the last table name.
with t as
(select 'select col1,
(select max(col3) from dd3) max_timestamp
from dd1,
dd2
where dd1.col1 = dd2.col1
and dd1.col1 in(select col1 from dd4)' sql_text from dual)
select regexp_substr(regexp_substr(upper(sql_text), '\sFROM\s*(\w|\.|_)*'), '(\w|_|\.)+', 1,2)
from t
Thanks,
DD.

This is a more of a regex question than an Oracle question.
If you can run the sql through REPLACE(REPLACE(sql,CHR(13),' '),CHR(10),NULL) to replace all newlines with a space, so that the query fits on a single line, here is regex that will return all the tables in group 1 (for the ones after FROM) and group 3 for subsequent items in a list:
/FROM ([A-Z0-9$#_]+)(,[\s]*([A-Z0-9$#_]+))*/gi
Having multiple groups is not ideal, so I would look at the full match instead, see https://regex101.com/r/OZUalH/1/ for an example (see full match on the right, where every match has from followed by one or more tables).
But let me warn you this is not going to be robust, as these valid FROM clause expressions are not handled:
"my_table"
MY_TABLE AS A
MY_TABLE AS "a"
etc...
If it were me, I would write a function to run the query through explain plan (execute immediate 'explain plan for ...') and extract the tables from the plan tables (or possibly using SYS.DBMS_XPLAN)

Related

MYSQL get substring

I'm trying to get substring dynamically and group by it. So if my uri column contains records like: /uri1/uri2 and /somelongword/someotherlongword I would like to get everything up to second delimiter, namely up to second / and count it. I'm using this query but obviously it is cutting string statically (6 letters after the first one).
SELECT substr(uri, 1, 6) as URI,
COUNT(*) as COUNTER
FROM staging
GROUP BY substr(uri, 1, 6)
ORDER BY COUNTER DESC
How can I achieve that?
You can use combination of SUBSTRING() and POSITION()
schema:
CREATE TABLE Table1
(`uri` varchar(10))
;
INSERT INTO Table1
(`uri`)
VALUES
('some/text'),
('some/text1'),
('some/text2'),
('aa/bb'),
('aa/cc'),
('bb/cc')
;
query
SELECT
SUBSTRING(uri,1,POSITION('/' IN uri)-1),
COUNT(*)
FROM Table1
GROUP BY SUBSTRING(uri,1,POSITION('/' IN uri)-1);
http://sqlfiddle.com/#!9/293dd3/3/0
edit: here I found amazon athena documentation: https://docs.aws.amazon.com/athena/latest/ug/presto-functions.html and here is the string function documentation: https://prestodb.io/docs/0.217/functions/string.html
my answer above still stands, but you might need to change SUBSTRING to SUBSTR
edit 2: it seems there's a special function to achieve this in amazon athena called SPLIT_PART()
query:
SELECT SPLIT_PART(uri, '/', 1), COUNT(*) FROM tbl GROUP BY SPLIT_PART(uri, '/', 1)
from docs:
split_part(string, delimiter, index) → varchar
Splits string on delimiter and returns the field index. Field indexes start with 1. If the index is larger than than the number of fields, then null is returned.

Special character to query from latest timestamp sharded table in BigQuery

From
https://cloud.google.com/bigquery/docs/partitioned-tables:
you can shard tables using a time-based naming approach such as [PREFIX]_YYYYMMDD
This enables me to do:
SELECT count(*) FROM `xxx.xxx.xxx_*`
and query across all the shards. Is there a special notation that queries only the latest shard? For example say I had:
xxx_20180726
xxx_20180801
could I do something along the lines of
SELECT count(*) FROM `xxx.xxx.xxx_{{ latest }}`
to query xxx_20180801?
SINGLE QUERY INSPIRED BY Mikhail Berlyant:
SELECT count(*) as c FROM `XXX.PREFIX_*` WHERE _TABLE_SUFFIX IN ( SELECT
SUBSTR(MAX(table_id), LENGTH('PREFIX_') + 2)
FROM
`XXX.__TABLES_SUMMARY__`
WHERE
table_id LIKE 'PREFIX_%')
If you do care about cost (meaning how many tables will be scaned by your query) - the only way to do so is to do in two steps like below
First query
#standardSQL
SELECT SUBSTR(MAX(table_id), LENGTH('PREFIX') + 1)
FROM `xxx.xxx.__TABLES_SUMMARY__`
WHERE table_id LIKE 'PREFIX%'
Second Query
#standardSQL
SELECT COUNT(*)
FROM `xxx.xxx.PREFIX_*`
WHERE _TABLE_SUFFIX = '<result of first query>'
so, if result of first query is 20180801 so, second query will obviously look like below
#standardSQL
SELECT COUNT(*)
FROM `xxx.xxx.PREFIX_*`
WHERE _TABLE_SUFFIX = '20180801'
If you don't care about cost but rather need just result - you can easily combine above two queries into one - but - again - remember - even though result will be out of last table - cost will be as you query all table that match xxx.xxx.PREFIX_*
Forgot to mention (even though it should be obvious): of course when you have only COUNT(1) in your SELECT - the cost will be 0(zero) for both options - but in reality - most likely you will have something more valuable than just count(1)
I know this is a kind of an old thread but I was surprised why no one offers an answer using Variables.
"Héctor Neri" already mentioned this in the comments but I thought might be better to have an actual answer with a sample code posted.
#standardSQL
DECLARE SHARD_DATE STRING;
SET SHARD_DATE=(
SELECT MAX(REPLACE(table_name,'{TABLE}_',''))
FROM `{PRJ}.{DATASET}.INFORMATION_SCHEMA.TABLES`
WHERE table_name LIKE '{TABLE}_20%'
);
SELECT * FROM `{PRJ}.{DATASET}.{TABLE}_*`
WHERE _TABLE_SUFFIX = SHARD_DATE
Make sure to replace {PRJ}, {DATASET}, and {TABLE} values with your table location.
If you run this on BigQuery Web UI, you will see this message:
WARNING: Could not compute bytes processed estimate for script.
But you can see that variable properly reduce the table scan to the latest partition and does not cause any extra cost after running the script.

Using another table in BigQuery Regex

I would like to map a string column to a category based on a regular expression match.
Is it possible to use another bigquery table containing the regular expressions and corresponding category for this? This would make it easier for me to update only a table when adding new categories/updating the regex, instead of having to update all queries that would use this lookup.
Query:
CASE
-- Use the entries from another table here
WHEN REGEXP_MATCH(string_to_check, cat1regex) THEN cat1
WHEN REGEXP_MATCH(string_to_check, cat2regex) THEN cat2
etc.
END
Mapping table:
Regex category
pagex|pagey xy
pagez|page1 z1
It's also possible there is another simple way to do something similar that I'm not thinking of, answers pointing those out are welcome too.
Any help would be appreciated.
Below is for BigQuery Standard SQL
#standardSQL
SELECT
string_to_check,
MAX(IF(REGEXP_CONTAINS(string_to_check, reg), category, NULL)) AS category
FROM yourTable
CROSS JOIN mappingTable
GROUP BY string_to_check
You can test / play with it using below dummy date from your question
#standardSQL
WITH `mappingTable` AS (
SELECT r'pagex|pagey' AS reg, 'xy' AS category UNION ALL
SELECT r'pagez|page1', 'z1'
),
`yourTable` AS (
SELECT string_to_check
FROM UNNEST(["pagex.com", "pagez#example.org", "page.example.net"]) AS string_to_check
)
SELECT
string_to_check,
MAX(IF(REGEXP_CONTAINS(string_to_check, reg), category, NULL)) AS category
FROM yourTable
CROSS JOIN mappingTable
GROUP BY string_to_check

PL/SQL regexp_like filters

I want to delete some tables and wrote this procedure:
set serveroutput on
declare
type namearray is table of varchar2(50);
total integer;
name namearray;
begin
--select statement here ..., please see below
total :=name.count;
dbms_output_line(total);
for i in 1 .. total loop
dbms_output.put_line(name(i));
-- execute immediate 'drop table ' || name(i) || ' purge';
End loop;
end;
/
The idea is to drop all tables with table name having pattern like this:
ERROR_REPORT[2 digit][3 Capital characters][10 digits]
example: ERROR_REPORT16MAY2014122748
However, I am not able to come up with the correct regexp. Below are my select statements and results:
select table_name bulk collect into name from user_tables where regexp_like(table_name, '^ERROR_REPORT[0-9{2}A-Z{3}0-9{10}]');
The results included all the table names I needed plus ERROR_REPORT311AUG20111111111. This should not be showing up in the result.
The follow select statement showed the same result, which meant the A-Z{3} had no effect on the regexp.
select table_name bulk collect into name from user_tables where regexp_like(table_name, '^ERROR_REPORT[0-9{2}0-9{10}]');
My question is what would be the correct regexp, and what's wrong with mine?
Thanks,
Alex
Correct regex is
'^ERROR_REPORT[0-9]{2}[A-Z]{3}[0-9]{10}'
I think this regex should work:
^ERROR_REPORT[0-9]{2}[A-Z]{3}[0-9]{10}
However, please check the regex101 link. I've assumed that you need 2 digits after ERROR_REPORT but your example name shows 3.

Compare column value against list of regex values stored in another table and update accordingly

I am new to Oracle programming.
I want to check the "msg" value of "Table1" against the "regex" values from "Table2".
If the regular expression matches as such, I want to update the respective "regex_id" in "Table1".
Usual query: SELECT 'match found' FROM DUAL WHERE REGEXP_LIKE('s 27', '^(s27|s 27)')
Table1
MSG REG_EXID
Ss27 ?
s27 ?
s28 ?
s29 ?
Table2
REGEX REG_EXID RELEVANCE
^(s27|s 27) 1 10
^(s29|s 29) 2 2
^(m28|m 28) 3 2
^(s27|s 27) 4 100
Taking the newly added "relevance" into account, with Oracle 11g you could try along
UPDATE Table1 T1
SET T1.reg_exID =
(SELECT DISTINCT
MAX(reg_exID) KEEP (DENSE_RANK FIRST ORDER BY relevance DESC) OVER (PARTITION BY regex)
FROM Table2
WHERE REGEXP_LIKE(T1.msg, regex)
)
;
See SQL Fiddle.
You could work along
UPDATE Table1
SET reg_exID = (SELECT reg_exID FROM Table2 WHERE REGEXP_LIKE(Table1.msg, regex));
Please keep in mind:
None of your current sample records will be updated as REGEX are case sensitive.
The above UPDATE will fail, if more than a single REGEX does match.
You could rewrite the current REGEX expressions along "^m ?28".
See it in action: SQL Fiddle (With some data added to actually show the effect.)
Please comment if and as clarification/adjustment is required.