BigQuery Arrays - check if Array contains specific values

BigQuery Arrays - check if Array contains specific values - google-cloud-platform

I'm trying to see if a certain set of items exist within a BigQuery Array.
Below query works (Checking if a 1 item exists within an array):
WITH sequences AS
(
SELECT 1 AS id, [10,20,30,40] AS some_numbers
UNION ALL
SELECT 2 AS id, [20,30,40,50] AS some_numbers
UNION ALL
SELECT 3 AS id, [40,50,60,70] AS some_numbers
)
SELECT id, some_numbers
FROM sequences
WHERE 20 IN UNNEST(some_numbers)
What I'm not able to do is below (Checking if a more than 1 item exists within an array):
(This query errors)
WITH sequences AS
(
SELECT 1 AS id, [10,20,30,40] AS some_numbers
UNION ALL
SELECT 2 AS id, [20,30,40,50] AS some_numbers
UNION ALL
SELECT 3 AS id, [40,50,60,70] AS some_numbers
)
SELECT id, some_numbers
FROM sequences
WHERE (20,30) IN UNNEST(some_numbers)
I managed to find below workaround, but I feel there's a better way to do this:
WITH sequences AS
(
SELECT 1 AS id, [10,20,30,40] AS some_numbers
UNION ALL
SELECT 2 AS id, [20,30,40,50] AS some_numbers
UNION ALL
SELECT 3 AS id, [40,50,60,70] AS some_numbers
)
SELECT id, some_numbers
FROM sequences
WHERE (
(
SELECT COUNT(1)
FROM UNNEST(some_numbers) s
WHERE s in (20,30)
) > 1
)
Any suggestions are appreciated.

Not much to suggest... Official docs suggest to use exists:
WHERE EXISTS (SELECT *
FROM UNNEST(some_numbers) AS s
WHERE s in (20,30));

Assuming you are looking for rows where ALL elements in match array [20, 30] are found in target array (some_numbers). Also assuming no duplicate numbers in both (match and target) arrays
select id, some_numbers
from sequences a,
unnest([struct([20, 30] as match)]) b
where (
select count(1) = array_length(match)
from a.some_numbers num
join b.match num
using(num)
)

Related

Unnesting the columns in BigQuery which can have null elements

with a1 as (
select 1 as num,
[1,2] as nested,
2 as cost
),
a2 as (
select 2 as num,
[4,7] as nested,
2 as cost
),
a3 as (
select 3 as num,
[9,8] as nested,
2 as cost
),
a4 as (
select 4 as num,
[4, 6, 8] as nested,
2 as cost
),
a5 as (
select 5 as num,
[19, 11] as nested,
2 as cost
),
a6 as (
select 6 as num,
[] as nested,
2 as cost
),
table as(
select * from a1
union all
select * from a2
union all
select * from a3
union all
select * from a4
union all
select * from a5
union all
select * from a6
)
select * except(nested)
from table, unnest(nested) as unnested
Result generated by the Query
select num, count(unnested) as count, max(cost) as cost_incurred
from table, unnest(nested) as unnested
group by num
Result generated by the Query
Problem with this result is: for num=6, I have no nested columns,
so when I unnest the nested column, it removes the row with num=6 completely, this will create discrepancy when I want to calculate the total cost incurred.
Any help is Appreciated!

Use left join instead of , (which is a shorthand for cross join):
select * except(nested)
from table left join unnest(nested) as unnested
select num, count(unnested) as count, max(cost) as cost_incurred
from table left join unnest(nested) as unnested
group by num

How can I select the value of the nth row in a column in Powerbi?

I am fairly new to PowerBi,
I am trying to select the top 3 values from a table but only use specific column values. My instinct is to create a measure for each row. Here is a sample table of the data.
I've tried this but don't know enough of DAX to know how to go any further. I am able to select the top 1 value, but if I change N to 3 it doesnt work. Even if I can choose the second value instead of just the first one would help. Some sort of row index or number in an array fashion.
row[1][name]
LowestSpenders =
"The lowest spenders for the day are "
&
CALCULATE(
VALUES(Top3Low[Name]),
TOPN(1, Top3Low, Top3Low[Spent], DESC)
)
I have also tried this
LowestSpenders =
"The lowest spenders for the day are "
&
CONCATENATEX(
Top3Lost,
VALUES(Top3Lost[ClientName]),
",",
Top3Lost[LastYear],
DESC
)
I want to select the names of the people based on their positions in the table and include them in a dynamic text measure.
Something like this.
"The lowest spenders for the day are: Bob, John and Mark"

How about something like this? Rank all the names and then pick out whatever ranks you want.
LowestSpenders =
VAR Summary =
SUMMARIZE (
Top3Low,
Top3Low[Name],
"Rank", RANK.EQ ( Top3Low[Spent], Top3Low[Spent], 1 )
)
RETURN
CONCATENATEX (
FILTER ( Summary, [Rank] IN { 1, 2, 3 } ),
[Name],
", "
)
Instead of [Rank] IN { 1, 2, 3 }, you can substitute whatever criterion you want, for example, [Rank] = 2 or [Rank] > 2 && [Rank] < 5.

You're nearly there.
Use TOPN to identify the lowest n spenders, and use CONCATENATEX to iterate over this table and concatenate the names:
LowestSpenders =
CONCATENATEX (
TOPN (
3,
MyTable,
MyTable[Spent],
ASC
),
MyTable[Name],
", "
)

Replacing everything but a specific pattern in a string (Oracle)

I want to replace everything in a string with '' except for a given pattern using Oracle's regexp_replace.
In my case the pattern refers to German licence plates. The patterns is contained in the usage column (verwendungszweck_bez) of a revenue table (of a bank). The pattern can be matched by ([a-z]{1,3})[- ]([a-z]{1,2}) ?([0-9]{1,4}). Now I'd like to reverse the matching pattern in order to match everything except for the pattern.
The usage column looks like this:
ALLIANZ VERSICHERUNGS-AG VERTRAG AS-9028000568 KFZ-VERSICHERUNG KFZ-VERS. XX-Y 427 01.01.19 - 31.12.19
XX-Y 427 would be the pattern I'm interested in. The string can contain more than one license plate:
AXA VERSICHERUNG AG 40301089910 KFZ HAFTPFLICHT ABC-RM10 37,35 + 40330601383 KFZ HAFTPFLIVHT ABC-LX 283 21,19
In this case I need ABC-RM10 and ABC-LX 283.
So far I just replace everything from the string with regexp_replace:
regexp_replace(lower(a.verwendungszweck_bez),'^(.*?)kfz','')
because there's always 'kfz' in the string and the licence plate information follows (not necessarily direct) after that.
upper(regexp_replace(regexp_substr(regexp_replace(lower(a.verwendungszweck_bez),'(^(.*?)kfz',''),'([a-z]{1,3})[- ]([a-z]{1,2}) ?([0-9]{1,4})',1,1),'([a-z]{1,3})[- ]([a-z]{1,2}) ?([0-9]{1,4})','\1-\2 \3'))
This works but I'm sure there's a better solution.
The result should be a list of customers, licence plates and count of cars like this:
Customer|licence plates |count
1234567 |XX-Y 427| 1
1255599 |ABC-RM 10 + ABC-LX 283| 2

You can use a recursive sub-query to find the items. Also, you can use UPPER and TRANSLATE to normalise the data to remove the optional separators in the number plates and convert it into a single case:
Test Data:
CREATE TABLE test_data ( value ) AS
SELECT 'ALLIANZ VERSICHERUNGS-AG VERTRAG AS-9028000568 KFZ-VERSICHERUNG KFZ-VERS. XX-Y 427 01.01.19 - 31.12.19' FROM DUAL UNION ALL
-- UNG AG 4030 should not match
SELECT 'AXA VERSICHERUNG AG 40301089910 KFZ HAFTPFLICHT ABC-RM10 37,35 + 40330601383 KFZ HAFTPFLIVHT ABC-LX 283 21,19' FROM DUAL UNION ALL
-- Multiple matches adjacent to each other
SELECT 'AA-A1BB-BB222CC C3333' FROM DUAL UNION ALL
-- Duplicate values with different separators and cases
SELECT 'AA-A1 AA-A 1 aa a1' FROM DUAL
Query:
WITH items ( value, item, next_pos ) AS (
SELECT value,
TRANSLATE( UPPER( REGEXP_SUBSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', 1, 1, 'i', 2 ) ), '_ -', '_' ),
REGEXP_INSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', 1, 1, 1, 'i', 2 ) - 1
FROM test_data
UNION ALL
SELECT value,
TRANSLATE( UPPER( REGEXP_SUBSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', next_pos, 1, 'i', 2 ) ), '_ -', '_' ),
REGEXP_INSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', next_pos, 1, 1, 'i', 2 ) - 1
FROM items
WHERE next_pos > 0
)
SELECT item,
COUNT(*)
FROM items
WHERE item IS NOT NULL AND NEXT_POS > 0
GROUP BY item
Output:
ITEM | COUNT(*)
:------- | -------:
CCC3333 | 1
AAA1 | 4
XXY427 | 1
ABCRM10 | 1
ABCLX283 | 1
BBBB222 | 1
db<>fiddle here
The result should be a list of customers ...
You haven't given any information on how customers relate to this; that part is left as an exercise to the reader (who hopefully has the client values somewhere and can correlate them to the input).
Update:
If you want the count of unique number plates per row then:
WITH items ( rid, value, item, next_pos ) AS (
SELECT ROWID,
value,
TRANSLATE( UPPER( REGEXP_SUBSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', 1, 1, 'i', 2 ) ), '_ -', '_' ),
REGEXP_INSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', 1, 1, 1, 'i', 2 ) - 1
FROM test_data
UNION ALL
SELECT rid,
value,
TRANSLATE( UPPER( REGEXP_SUBSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', next_pos, 1, 'i', 2 ) ), '_ -', '_' ),
REGEXP_INSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', next_pos, 1, 1, 'i', 2 ) - 1
FROM items
WHERE next_pos > 0
)
SELECT LISTAGG( item, ' + ' ) WITHIN GROUP ( ORDER BY item ) AS items,
COUNT(*)
FROM (
SELECT DISTINCT
rid,
item
FROM items
WHERE item IS NOT NULL AND NEXT_POS > 0
)
GROUP BY rid;
Which outputs:
ITEMS | COUNT(*)
:----------------------- | -------:
XXY427 | 1
ABCLX283 + ABCRM10 | 2
AAA1 + BBBB222 + CCC3333 | 3
AAA1 | 1
db<>fiddle here

Using REGEXP_EXTRACT extract 7 digit numbers

I have one requirement.I want to extract 7 digit numbers from one of the column and lookup with another table get another column for each 7 digit number and concatenate with "|".
Column Data: In this need to extract 7 digit numbers.
";2435034;1;5.98;;eVar36=bopis|ev2=2605,;1483528;1;17.97;;ev6=bopis|evar52=2605,;1010203;1;7.98;;ev6=bopis|ev2=2605"
Output(Extract 7 digit number):
2435034,1483528,1010203
Another table:
account name
2435034 D1
1483528 D2
1010203 D3
Final output is(after joining with another table):
account_nbr account_name
2435034|1483528|1010203 D1|D2|D3
I tried with the following command to extract 7 digit number. I was getting only first number, remaining number are not coming.
REGEXP_EXTRACT(REGEXP_REPLACE(";2435034;1;5.98;;eVar36=bopis|ev2=2605,;1483528;1;17.97;;ev6=bopis|evar52=2605,;1010203;1;7.98;;ev6=bopis|ev2=2605", r'[^\d]+', ','),r'[0-9]+')
This might be simple but not able to figure it out. Tried with GROUP_CONCAT and SPLIT function also getting following error.
Exactly one capturing group must be specified
Please let me know if you have any suggestions.
Thanks in advance.

Below is for BigQuery Standard SQL
#standardSQL
SELECT
STRING_AGG(nbr, '|' ORDER BY pos) account_nbr,
STRING_AGG(name, '|' ORDER BY pos) account_name,
data
FROM `project.dataset.yourTable` t,
UNNEST(REGEXP_EXTRACT_ALL(REGEXP_REPLACE(t.data, r'[^\d]+', ','),r'[0-9]{7}')) nbr WITH OFFSET pos
JOIN `project.dataset.anotherTable` x
ON CAST(x.account AS STRING) = nbr
GROUP BY data
You can test / play with it using dummy data from your question:
#standardSQL
WITH `project.dataset.yourTable` AS (
SELECT ";2435034;1;5.98;;eVar36=bopis|ev2=2605,;1483528;1;17.97;;ev6=bopis|evar52=2605,;1010203;1;7.98;;ev6=bopis|ev2=2605" data
), `project.dataset.anotherTable` AS (
SELECT 2435034 account, 'D1' name UNION ALL
SELECT 1483528, 'D2' UNION ALL
SELECT 1010203, 'D3'
)
SELECT
STRING_AGG(nbr, '|' ORDER BY pos) account_nbr,
STRING_AGG(name, '|' ORDER BY pos) account_name,
data
FROM `project.dataset.yourTable` t,
UNNEST(REGEXP_EXTRACT_ALL(REGEXP_REPLACE(t.data, r'[^\d]+', ','),r'[0-9]{7}')) nbr WITH OFFSET pos
JOIN `project.dataset.anotherTable` x
ON CAST(x.account AS STRING) = nbr
GROUP BY data
Update for new question in comments: records are getting filtered if t.data is null. Is there a way i can get the records even t.data is null? In my table some of the records doesnt have value for t.data
#standardSQL
WITH `project.dataset.yourTable` AS (
SELECT 1 id, ";2435031;1;5.98;;eVar36=bopis|ev2=2605,;1483528;1;17.97;;ev6=bopis|evar52=2605,;1010203;1;7.98;;ev6=bopis|ev2=2605" data UNION ALL
SELECT 2, NULL
), `project.dataset.anotherTable` AS (
SELECT 2435034 account, 'D1' name UNION ALL
SELECT 1483528, 'D2' UNION ALL
SELECT 1010203, 'D3'
)
SELECT
id,
(SELECT STRING_AGG(nbr, '|' ORDER BY pos)
FROM UNNEST(REGEXP_EXTRACT_ALL(
REGEXP_REPLACE(t.data, r'[^\d]+', ','),r'[0-9]{7}')) nbr WITH OFFSET pos
JOIN `project.dataset.anotherTable` x
ON CAST(x.account AS STRING) = nbr
) a_nbr,
(SELECT STRING_AGG(name, '|' ORDER BY pos)
FROM UNNEST(REGEXP_EXTRACT_ALL(
REGEXP_REPLACE(t.data, r'[^\d]+', ','),r'[0-9]{7}')) nbr WITH OFFSET pos
JOIN `project.dataset.anotherTable` x
ON CAST(x.account AS STRING) = nbr
) a_name,
data
FROM `project.dataset.yourTable` t
GROUP BY id, data

Regular expression to remove duplicates from comma separated string

I have following string:
'C,2,1,2,3,1'
I need a regular expression to remove duplicates and the result string should be like this:
'C,2,1,3'

If your input data is more than one string, I assume there is some kind of id column you can use to distinguish the strings from each other. If no such column exists, it can be created in the first factored subquery, for example by using rownum.
with
inputs ( id, str ) as (
select 1, 'C,2,1,2,3,1' from dual union all
select 2, 'A,ZZ,3,A,3,ZZ' from dual
),
unwrapped ( id, str, lvl, token ) as (
select id, str, level, regexp_substr(str, '[^,]+', 1, level)
from inputs
connect by level <= 1 + regexp_count(str, ',')
and prior id = id
and prior sys_guid() is not null
),
with_rn ( id, str, lvl, token, rn ) as (
select id, str, lvl, token, row_number() over (partition by id, token order by lvl)
from unwrapped
)
select id, str, listagg(token, ',') within group (order by lvl) as new_str
from with_rn
where rn = 1
group by id, str
order by id
;
ID STR NEW_STR
---- ------------------ --------------------
1 C,2,1,2,3,1 C,2,1,3
2 A,ZZ,3,A,3,ZZ A,ZZ,3

Try this:
with
-- your input data
t_in as (select 'C,2,1,2,3,1' as s from dual),
-- your string splitted into a table, a row per list item
t_split as (
select (regexp_substr(s,'(\w+)(,|$)',1,rownum,'c',1)) s,
level n
from t_in
connect by level <= regexp_count(s,'(\w+)(,|$)') + 1
),
-- this table grouped to obtain distinct values with
-- minimum levels for sorting
t_grouped as (
select s, min(n) n from t_split group by s
)
select listagg(s, ',') within group (order by n)
from t_grouped;
Depending on your Oracle version you might have to replace listagg with wm_concat (it's googlable)

Here another shorter solution:
select listagg(val, ',') within group(order by min(id))
from (select rownum as id,
trim(regexp_substr(str, '[^,]+', 1, level)) as val
from (select 'C,2,1,2,3,1' as str from dual)
connect by regexp_substr(str, '[^,]+', 1, level) is not null)
group by val;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

BigQuery Arrays - check if Array contains specific values - google-cloud-platform

Not much to suggest... Official docs suggest to use exists: WHERE EXISTS (SELECT * FROM UNNEST(some_numbers) AS s WHERE s in (20,30));

Related

Unnesting the columns in BigQuery which can have null elements

How can I select the value of the nth row in a column in Powerbi?

Replacing everything but a specific pattern in a string (Oracle)

Using REGEXP_EXTRACT extract 7 digit numbers

Regular expression to remove duplicates from comma separated string

Categories

Resources