I have to make a regular expression where I need to replace all the words of a dynamic query in ORACLE with NULL except for those words that begin with the # character. For example:
SQL:
SQL: SELECT #param1, column2, column3, #param2 FROM dual WHERE #code = code_table AND amount > #param4 + 50
Using REGEXP_REPLACE
DECLARE
vl_result VARCHAR2(1000);
BEGIN
vl_result := REGEXP_REPLACE('SELECT #param1, column2, column3, #param2 FROM dual WHERE #code = code_table AND amount > #param4 + 50', 'EXP_REG', '');
dbms_output.put_line(vl_result);
END;
should have the following result:
#param1#param2#code#param4
And try several in various ways and still can not.
They know if you can do this?? and how serious the regular expression.
I'm handling PL / SQL
The following works for the given example:
select REGEXP_REPLACE('SELECT #param1, column2, column3, #param2 FROM dual WHERE #code = code_table AND amount > #param4 + 50', '.*?((#[^ ,]+)|$)', '\1') new_str
from dual;
NEW_STR
--------------------------
#param1#param2#code#param4
This also uses a back reference, and works for your example:
select REGEXP_REPLACE('SELECT #param1, column2, column3, #param2 FROM dual WHERE #code = code_table AND amount > #param4 + 50',
'[^#]?(#[[:alnum:]]+)?', '\1')
from dual;
REGEXP_REPLACE('SELECT#PAR
--------------------------
#param1#param2#code#param4
The same thing works from PL/SQL (as does #boneists of course):
set serveroutput on
DECLARE
vl_result VARCHAR2(1000);
BEGIN
vl_result := REGEXP_REPLACE('SELECT #param1, column2, column3, #param2 FROM dual WHERE #code = code_table AND amount > #param4 + 50',
'[^#]?(#[[:alnum:]]+)?', '\1');
dbms_output.put_line(vl_result);
END;
/
PL/SQL procedure successfully completed.
#param1#param2#code#param4
Related
I am trying to grab a substring from a queryText column. The queryText column is a SQL query statement. And my goal is to parse and extract specific patterns into a new column called TableName.
parse kind=regex queryText with "[Ff][Rr][Oo][Mm]" TableName
Above is my current Regex statement. It returns all characters after "FROM" or "from". I would like to only grab characters after "FROM" and before the first whitespace or newline. Any idea on what i have to add to the regex expression to do this?
you could use the extract() function.
for example (using the i flag for case-insensitivity):
datatable(input:string)
[
"select * FROM MyTable\n where X > 1",
"SELECT A,B,C from MyTable",
"select COUNT(*) from MyTable GROUP BY X",
"select * FROM MyTable",
"select * from [a].[b]",
]
| extend output = extract(#"(?i)from\s+([^\s]+)\s*", 1, input)
input
output
select * from [a].[b]
[a].[b]
select * FROM MyTable where X > 1
MyTable
SELECT A,B,C from MyTable
MyTable
select COUNT(*) from MyTable GROUP BY X
MyTable
select * FROM MyTable
MyTable
So I have a list of keywords:
['xxxxl','xxxl','xxl','xl','xxxxt','xxxt','xxt','xt']
In bigquery, I want to write a regex, inside the following sql code
SELECT my_column
FROM table
REGEXP_CONTAINS(lower(my_column),regex)
so that my output table contains only the values that don't match any of the items in keywords list.
Thanks
Below is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.lookup_table` AS (
SELECT ['xxxxl','xxxl','xxl','xl','xxxxt','xxxt','xxt','xt'] keywords
)
SELECT my_column
FROM `project.dataset.table`,
(SELECT STRING_AGG(LOWER(keyword), '|') exclude_pattern
FROM `project.dataset.lookup_table`,
UNNEST(keywords) keyword)
WHERE NOT REGEXP_CONTAINS(LOWER(my_column), exclude_pattern)
You can test / play with above using below simplified example
#standardSQL
WITH `project.dataset.lookup_table` AS (
SELECT ['xxxxl','xxxl','xxl','xl','xxxxt','xxxt','xxt','xt'] keywords
), `project.dataset.table` AS (
SELECT 'xxxxl' my_column UNION ALL
SELECT 'abc'
)
SELECT my_column
FROM `project.dataset.table`,
(SELECT STRING_AGG(LOWER(keyword), '|') exclude_pattern
FROM `project.dataset.lookup_table`,
UNNEST(keywords) keyword)
WHERE NOT REGEXP_CONTAINS(LOWER(my_column), exclude_pattern)
with output
Row my_column
1 abc
In my query, I have a value formatted as a dollar amount, like this:
Coverage_Amount
$10,000
$15,000
null
$2,000
So I remove the extra characters and map the null to 0. I get a column back like this:
Coverage_Amount
10000
15000
0
2000
However, these values are stored as strings, and when I try something like this:
CASE
WHEN Coverage_Amount IS NOT NULL THEN INTEGER(REGEXP_REPLACE(query.Coverage_Amount, r'\$|,', ''))
ELSE 0
END AS Coverage_Amount
I get back
Coverage_Amount
null
null
0
null
The documentation for the INTEGER() function says
Casts expr to a 64-bit integer. Returns NULL if is a string that doesn't correspond to an integer value.
Is there anything I can do to make BigQuery recognize that these are in fact integers?
Both below versions for BigQuery (respectivelly Legacy SQL and StandardSQL) work and return below result
Coverage_Amount val
10000 10000
15000 15000
2000 2000
Legacy SQL
#legacySQL
SELECT
Coverage_Amount,
IFNULL(INTEGER(REGEXP_REPLACE(Coverage_Amount, r'\$|,', '')), 0) AS val
FROM
(SELECT '10000' Coverage_Amount),
(SELECT '15000' Coverage_Amount),
(SELECT '2000' Coverage_Amount)
Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT '10000' Coverage_Amount UNION ALL
SELECT '15000' UNION ALL
SELECT '2000'
)
SELECT
Coverage_Amount,
IFNULL(CAST(REGEXP_REPLACE(Coverage_Amount, r'\$|,', '') AS INT64), 0) AS val
FROM `project.dataset.table`
Obviously, same works for '$15,000' and '$10,000' and '$2,000' etc.
It could be because you have spaces after 0 at the end of string.
I mean f.e. '&10000 '. So you can try to use RTRIM(value, ' ')
SELECT
Coverage_Amount,
IFNULL(INTEGER(REGEXP_REPLACE(RTRIM(Coverage_Amount, ' '), r'\$|,', '')),0) AS val
FROM
(SELECT '$10,000 ' Coverage_Amount)
to delete all spaces from the end of string
Then output will be:
Row Coverage_Amount val
1 $10,000 10000
Are you using Standard? This worked for me (notice I use the CAST operator):
WITH data as(
select "$10,000" d UNION ALL
select "$15,000" UNION ALL
select "$2,000")
SELECT
d,
CAST(REGEXP_REPLACE(d, r'\$|,', '') AS INT64) AS Coverage_Amount
FROM data
I have date formats in all the possible permutations. MM/DD/YYYY, M/D/YYYY, MM/D/YYYY, M/DD/YYYY
Now I need to write a regular expression in Oracle DB to fetch different date formats from 1 column as is
Try this one:
with t(date_col) as (
select '01/01/2014' from dual
union all
select '1/2/2014' from dual
union all
select '01/3/2014' from dual
union all
select '1/04/2014' from dual
union all
select '11/1/14' from dual)
select date_col,
case
when regexp_instr(date_col, '^\d/\d/\d{4}$') = 1 then
'd/m/yyyy'
when regexp_instr(date_col, '^\d{2}/\d/\d{4}$') = 1 then
'dd/m/yyyy'
when regexp_instr(date_col, '^\d/\d{2}/\d{4}$') = 1 then
'd/mm/yyyy'
when regexp_instr(date_col, '^\d{2}/\d{2}/\d{4}$') = 1 then
'dd/mm/yyyy'
else
'Unknown format'
end date_format
from t;
DATE_COL DATE_FORMAT
---------- --------------
01/01/2014 dd/mm/yyyy
1/2/2014 d/m/yyyy
01/3/2014 dd/m/yyyy
1/04/2014 d/mm/yyyy
11/1/14 Unknown format
I am not sure what your goal is, but since months are always first, followed by day, you can use the following expression to get a date regardless of the input format:
select to_date( column, 'mm/dd/yyyy') from ...
You can select all records for which the following is true:
where [column_value] != to_char(to_date([column_value],'MM/DD/YYYY'),'MM/DD/YYYY')
I want to extract text from a column using regular expressions in Oracle 11g. I have 2 queries that do the job but I'm looking for a (cleaner/nicer) way to do it. Maybe combining the queries into one or a new equivalent query. Here they are:
Query 1: identify rows that match a pattern:
select column1 from table1 where regexp_like(column1, pattern);
Query 2: extract all matched text from a matching row.
select regexp_substr(matching_row, pattern, 1, level)
from dual
connect by level < regexp_count(matching_row, pattern);
I use PL/SQL to glue these 2 queries together, but it's messy and clumsy. How can I combine them into 1 query. Thank you.
UPDATE: sample data for pattern 'BC':
row 1: ABCD
row 2: BCFBC
row 3: HIJ
row 4: GBC
Expected result is a table of 4 rows of 'BC'.
You can also do it in one query, functions/procedures/packages not required:
WITH t1 AS (
SELECT 'ABCD' c1 FROM dual
UNION
SELECT 'BCFBC' FROM dual
UNION
SELECT 'HIJ' FROM dual
UNION
SELECT 'GBC' FROM dual
)
SELECT c1, regexp_substr(c1, 'BC', 1, d.l, 'i') thePattern, d.l occurrence
FROM t1 CROSS JOIN (SELECT LEVEL l FROM dual CONNECT BY LEVEL < 200) d
WHERE regexp_like(c1,'BC','i')
AND d.l <= regexp_count(c1,'BC');
C1 THEPATTERN OCCURRENCE
----- -------------------- ----------
ABCD BC 1
BCFBC BC 1
BCFBC BC 2
GBC BC 1
SQL>
I've arbitrarily limited the number of occurrences to search for at 200, YMMV.
Actually there is an elegant way to do this in one query, if you do not mind to run some extra miles. Please note that this is just a sketch, I have not run it, you'll probably have to correct a few typos in it.
create or replace package yo_package is
type word_t is record (word varchar2(4000));
type words_t is table of word_t;
end;
/
create or replace package body yo_package is
function table_function(in_cur in sys_refcursor, pattern in varchar2)
return words_t
pipelined parallel_enable (partition in_cur by any)
is
next varchar2(4000);
match varchar2(4000);
word_rec word_t;
begin
word_rec.word = null;
loop
fetch in_cur into next;
exit when in_cur%notfound;
--this you inner loop where you loop through the matches within next
--you have to implement this
loop
--TODO get the next match from next
word_rec.word := match;
pipe row (word_rec);
end loop;
end loop;
end table_function;
end;
/
select *
from table(
yo_package.table_function(
cursor(
--this is your first select
select column1 from table1 where regexp_like(column1, pattern)
)
)