RegEx in BigQuery - regex

I need to split the following field: LP1234354_CD12346
and get the 2 separate columns with the following values:1234354 and 12346.
I tried regex and right/left but not successful. Thank you in advance!
Dummy data:
SELECT 'LP1234354_CD12346' AS word UNION ALL
SELECT 'LP1234456_CD12345'

Below is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 AS id, 'LP1234354_CD12346' AS word UNION ALL
SELECT 2, 'LP1234456_CD12345'
)
SELECT id,
REGEXP_EXTRACT_ALL(word, r'(\d+)')[SAFE_OFFSET(0)] AS val1,
REGEXP_EXTRACT_ALL(word, r'(\d+)')[SAFE_OFFSET(1)] AS val2
FROM `project.dataset.table`

Related

what is wrong with my query in oracle apex?

The following code does not work, but it does work when I convert it to the next code. Why?
select * ,count(*) over(partition by colour) as counts_by_colour
from bricks;
output:
ORA-00923: FROM keyword not found where expected
modified:
select b.* ,count(*) over(partition by colour) as counts_by_colour
from bricks b;
This is just the way SQL works - nothing to do with APEX.
select * means select all columns from what follows. So...
select * from emp join dept;
...returns all columns from emp and all columns from dept.
You are not allowed to select anything else with select * - e.g. ...
select *, 'abc' from emp;
raises the same error you got.
However, you can use select alias.* to select all columns from one table/view in the query, and then you are also allowed to select other things:
select e.*, 'abc' from emp e;
or
select e.*, d.loc from emp e join dept d;
The "implicit" alias also works:
select emp.*, 'abc' from emp;

Why does Calcite change GROUP_CONCAT to LISTAGG?

I built a RelNode using the following SQL:
SELECT GROUP_CONCAT(ename ORDER BY ename DESC SEPARATOR 'a') FROM emp
and I used RelToSqlConverter to converter it to SQL. I get this SQL:
SELECT LISTAGG(`ename`, 'a') WITHIN GROUP (ORDER BY `ename` IS NULL DESC, `ename` DESC) FROM `emp`
But I want to get GROUP_CONCAT not LISTAGG.
Check https://issues.apache.org/jira/browse/CALCITE-4349
GROUP_CONCAT is analogous to LISTAGG (see CALCITE-2754) (and also to BigQuery and > PostgreSQL's STRING_AGG, see CALCITE-4335). For example, the query
SELECT deptno, GROUP_CONCAT(ename ORDER BY empno SEPARATOR ';')
FROM Emp
GROUP BY deptno
is equivalent to (and in Calcite's algebra would be desugared to)
SELECT deptno, LISTAGG(ename, ';') WITHIN GROUP (ORDER BY empno)
FROM Emp
GROUP BY deptno

how to write regex for list of items sharing the same roots

So I have a list of keywords:
['xxxxl','xxxl','xxl','xl','xxxxt','xxxt','xxt','xt']
In bigquery, I want to write a regex, inside the following sql code
SELECT my_column
FROM table
REGEXP_CONTAINS(lower(my_column),regex)
so that my output table contains only the values that don't match any of the items in keywords list.
Thanks
Below is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.lookup_table` AS (
SELECT ['xxxxl','xxxl','xxl','xl','xxxxt','xxxt','xxt','xt'] keywords
)
SELECT my_column
FROM `project.dataset.table`,
(SELECT STRING_AGG(LOWER(keyword), '|') exclude_pattern
FROM `project.dataset.lookup_table`,
UNNEST(keywords) keyword)
WHERE NOT REGEXP_CONTAINS(LOWER(my_column), exclude_pattern)
You can test / play with above using below simplified example
#standardSQL
WITH `project.dataset.lookup_table` AS (
SELECT ['xxxxl','xxxl','xxl','xl','xxxxt','xxxt','xxt','xt'] keywords
), `project.dataset.table` AS (
SELECT 'xxxxl' my_column UNION ALL
SELECT 'abc'
)
SELECT my_column
FROM `project.dataset.table`,
(SELECT STRING_AGG(LOWER(keyword), '|') exclude_pattern
FROM `project.dataset.lookup_table`,
UNNEST(keywords) keyword)
WHERE NOT REGEXP_CONTAINS(LOWER(my_column), exclude_pattern)
with output
Row my_column
1 abc

Query in ascending order based on the number of occurances of numbers in a column

I have to write a query where I need to fetch the data from a table sorting on a varchar column based on number of occurrences of numbers in the column values
Ex:
Data
abc123bcsAny
edef2323sdfhsk3212
shdfks
Here if I try to fetch it in ascending order, it should give me the result as
shdfks
abc123bcsAny
edef2323sdfhsk3212
Can you help in writing this query?
You can try this:
with test as(
select 'abc123bcsAny' string from dual union all
select 'edef2323sdfhsk3212' from dual union all
select 'shdfks' from dual
)
select *
from test
order by regexp_count(string, '[0-9]')
To order by the count of the number sub-strings within the string:
SELECT *
FROM table_name
ORDER BY REGEXP_COUNT( data, '\d+' ) ASC

Regular expression on Dates in Oracle

I have date formats in all the possible permutations. MM/DD/YYYY, M/D/YYYY, MM/D/YYYY, M/DD/YYYY
Now I need to write a regular expression in Oracle DB to fetch different date formats from 1 column as is
Try this one:
with t(date_col) as (
select '01/01/2014' from dual
union all
select '1/2/2014' from dual
union all
select '01/3/2014' from dual
union all
select '1/04/2014' from dual
union all
select '11/1/14' from dual)
select date_col,
case
when regexp_instr(date_col, '^\d/\d/\d{4}$') = 1 then
'd/m/yyyy'
when regexp_instr(date_col, '^\d{2}/\d/\d{4}$') = 1 then
'dd/m/yyyy'
when regexp_instr(date_col, '^\d/\d{2}/\d{4}$') = 1 then
'd/mm/yyyy'
when regexp_instr(date_col, '^\d{2}/\d{2}/\d{4}$') = 1 then
'dd/mm/yyyy'
else
'Unknown format'
end date_format
from t;
DATE_COL DATE_FORMAT
---------- --------------
01/01/2014 dd/mm/yyyy
1/2/2014 d/m/yyyy
01/3/2014 dd/m/yyyy
1/04/2014 d/mm/yyyy
11/1/14 Unknown format
I am not sure what your goal is, but since months are always first, followed by day, you can use the following expression to get a date regardless of the input format:
select to_date( column, 'mm/dd/yyyy') from ...
You can select all records for which the following is true:
where [column_value] != to_char(to_date([column_value],'MM/DD/YYYY'),'MM/DD/YYYY')