Using REGEXP_EXTRACT extract 7 digit numbers - regex

I have one requirement.I want to extract 7 digit numbers from one of the column and lookup with another table get another column for each 7 digit number and concatenate with "|".
Column Data: In this need to extract 7 digit numbers.
";2435034;1;5.98;;eVar36=bopis|ev2=2605,;1483528;1;17.97;;ev6=bopis|evar52=2605,;1010203;1;7.98;;ev6=bopis|ev2=2605"
Output(Extract 7 digit number):
2435034,1483528,1010203
Another table:
account name
2435034 D1
1483528 D2
1010203 D3
Final output is(after joining with another table):
account_nbr account_name
2435034|1483528|1010203 D1|D2|D3
I tried with the following command to extract 7 digit number. I was getting only first number, remaining number are not coming.
REGEXP_EXTRACT(REGEXP_REPLACE(";2435034;1;5.98;;eVar36=bopis|ev2=2605,;1483528;1;17.97;;ev6=bopis|evar52=2605,;1010203;1;7.98;;ev6=bopis|ev2=2605", r'[^\d]+', ','),r'[0-9]+')
This might be simple but not able to figure it out. Tried with GROUP_CONCAT and SPLIT function also getting following error.
Exactly one capturing group must be specified
Please let me know if you have any suggestions.
Thanks in advance.

Below is for BigQuery Standard SQL
#standardSQL
SELECT
STRING_AGG(nbr, '|' ORDER BY pos) account_nbr,
STRING_AGG(name, '|' ORDER BY pos) account_name,
data
FROM `project.dataset.yourTable` t,
UNNEST(REGEXP_EXTRACT_ALL(REGEXP_REPLACE(t.data, r'[^\d]+', ','),r'[0-9]{7}')) nbr WITH OFFSET pos
JOIN `project.dataset.anotherTable` x
ON CAST(x.account AS STRING) = nbr
GROUP BY data
You can test / play with it using dummy data from your question:
#standardSQL
WITH `project.dataset.yourTable` AS (
SELECT ";2435034;1;5.98;;eVar36=bopis|ev2=2605,;1483528;1;17.97;;ev6=bopis|evar52=2605,;1010203;1;7.98;;ev6=bopis|ev2=2605" data
), `project.dataset.anotherTable` AS (
SELECT 2435034 account, 'D1' name UNION ALL
SELECT 1483528, 'D2' UNION ALL
SELECT 1010203, 'D3'
)
SELECT
STRING_AGG(nbr, '|' ORDER BY pos) account_nbr,
STRING_AGG(name, '|' ORDER BY pos) account_name,
data
FROM `project.dataset.yourTable` t,
UNNEST(REGEXP_EXTRACT_ALL(REGEXP_REPLACE(t.data, r'[^\d]+', ','),r'[0-9]{7}')) nbr WITH OFFSET pos
JOIN `project.dataset.anotherTable` x
ON CAST(x.account AS STRING) = nbr
GROUP BY data
Update for new question in comments: records are getting filtered if t.data is null. Is there a way i can get the records even t.data is null? In my table some of the records doesnt have value for t.data
#standardSQL
WITH `project.dataset.yourTable` AS (
SELECT 1 id, ";2435031;1;5.98;;eVar36=bopis|ev2=2605,;1483528;1;17.97;;ev6=bopis|evar52=2605,;1010203;1;7.98;;ev6=bopis|ev2=2605" data UNION ALL
SELECT 2, NULL
), `project.dataset.anotherTable` AS (
SELECT 2435034 account, 'D1' name UNION ALL
SELECT 1483528, 'D2' UNION ALL
SELECT 1010203, 'D3'
)
SELECT
id,
(SELECT STRING_AGG(nbr, '|' ORDER BY pos)
FROM UNNEST(REGEXP_EXTRACT_ALL(
REGEXP_REPLACE(t.data, r'[^\d]+', ','),r'[0-9]{7}')) nbr WITH OFFSET pos
JOIN `project.dataset.anotherTable` x
ON CAST(x.account AS STRING) = nbr
) a_nbr,
(SELECT STRING_AGG(name, '|' ORDER BY pos)
FROM UNNEST(REGEXP_EXTRACT_ALL(
REGEXP_REPLACE(t.data, r'[^\d]+', ','),r'[0-9]{7}')) nbr WITH OFFSET pos
JOIN `project.dataset.anotherTable` x
ON CAST(x.account AS STRING) = nbr
) a_name,
data
FROM `project.dataset.yourTable` t
GROUP BY id, data

Related

How to get values in PowerBI between two dates in two different columns

Hi i have an issue i cant fix in PowerBI i dont understand DAX that mutch.
I have a half solution in DAX and an example what i have tryed.
I have the solution in SQL.
WANTED RESULT
I need to get the "time" result summed up
that have values between the two selected values.
IMPORTANT if one rows of values starts before AND after the selected values then the operation was active that time and shall be included.
#sdate = '2020'
#sdate = '2021'
Select *
From #temp
where (datepart(year,startdate) <= #sdate and datepart(year,enddate))
or (startdate between #sdate and #edate)
or (enddate between #sdate and #edate)
If i do it in SSMS i get the right rows
But in PowerBI i have some issue
I need to be able to choose två year
Year from startdate
Year from enddate
This part work but not the full solution
I just get row 3 and 4 as it sould of this solution.
UPDATED
NOW I DONT EVEN GET THIS PART RIGHT
I want all the green to include and exclude the red
Antal (under året) =
var SelectedYearStart = CONVERT(SELECTEDVALUE(TEST[startdate].[Year]), INTEGER)
var SelectedYearEnd = CONVERT(SELECTEDVALUE(TEST[enddate].[Year]), INTEGER)
return CALCULATE(SUM(TEST[time]),ALLCROSSFILTERED(TEST),year(TEST[startdate])<=SelectedYearStart , year(TEST[enddate])>=SelectedYearEnd)
My guess was this
Antal (under året) =
var SelectedYearStart = CONVERT(SELECTEDVALUE(TEST[startdate].[Year]), INTEGER)
var SelectedYearEnd = CONVERT(SELECTEDVALUE(TEST[enddate].[Year]), INTEGER)
return CALCULATE(SUM(TEST[time]),ALLCROSSFILTERED(TEST),year(TEST[startdate])<=SelectedYearStart , year(TEST[enddate])>=SelectedYearEnd || DATESBETWEEN(TEST[startdate],SelectedYearStart,SelectedYearEnd || DATESBETWEEN(TEST[enddate],SelectedYearStart,SelectedYearEnd)))
But then i get this error
TEST DATA
declare #sdate nvarchar(4)
declare #edate nvarchar(4)
set #sdate = '2020'
set #edate = '2021'
select #sdate sdate
select #edate edate
DEclare #temp table (time decimal(18,2) , startdate date, enddate date)
INSERT INTO #temp
SELECT 5.0,'2019-01-01','2020-12-01' union all --
SELECT 5.0,'2021-01-01','2022-12-01' union all --
select 5.0,'2020-01-01','2021-12-01' union all
select 5.0,'2019-01-01','2022-12-01' union all --
select 5.0,'2019-01-01','2019-12-01' union all --
select 5.0,'2022-01-01','2022-12-01' union all
select 5.0,'2020-01-01','2020-12-01' union all
select 5.0,'2021-01-01','2021-12-01'
--select 5.0,'2020-01-01','3000-01-01' --EXTRA
SELECT *
into TEST
FROM #temp
--ORDER BY startdate,enddate
You need something like this:
var SelectedYearStart = SELECTEDVALUE(TEST[startdate].[Year])
var SelectedYearEnd = SELECTEDVALUE(TEST[enddate].[Year])
return CALCULATE(SUM(TEST[time]),
ALL ( TEST[startdate].[Year]),
ALL ( TEST[enddate].[Year]),
KEEPFILTERS(
(Year(TEST[startdate]) <= SelectedYearStart &&
Year(TEST[enddate]) >= SelectedYearEnd)
|| (Year(TEST[startdate]) >= SelectedYearStart &&
Year(TEST[startdate]) <= SelectedYearEnd)
|| (Year(TEST[enddate]) >= SelectedYearStart &&
Year(TEST[enddate]) <= SelectedYearEnd))
)
First you need to clear the filters on the table that are created via the two slicers (startdate.Year and enddate.Year)
Then you need to pass in your complex filter query. Note that I'm using the Year(..) function because you can't reference columns from different tables in the filter section of the Calculate (and startdate.year and enddate.year are coming from two separate date tables created automatically by powerbi).
Finally you need to wrap that into a KEEPFILTERS statement to ensure that only the current row context is being applied to the expression.

BigQuery Arrays - check if Array contains specific values

I'm trying to see if a certain set of items exist within a BigQuery Array.
Below query works (Checking if a 1 item exists within an array):
WITH sequences AS
(
SELECT 1 AS id, [10,20,30,40] AS some_numbers
UNION ALL
SELECT 2 AS id, [20,30,40,50] AS some_numbers
UNION ALL
SELECT 3 AS id, [40,50,60,70] AS some_numbers
)
SELECT id, some_numbers
FROM sequences
WHERE 20 IN UNNEST(some_numbers)
What I'm not able to do is below (Checking if a more than 1 item exists within an array):
(This query errors)
WITH sequences AS
(
SELECT 1 AS id, [10,20,30,40] AS some_numbers
UNION ALL
SELECT 2 AS id, [20,30,40,50] AS some_numbers
UNION ALL
SELECT 3 AS id, [40,50,60,70] AS some_numbers
)
SELECT id, some_numbers
FROM sequences
WHERE (20,30) IN UNNEST(some_numbers)
I managed to find below workaround, but I feel there's a better way to do this:
WITH sequences AS
(
SELECT 1 AS id, [10,20,30,40] AS some_numbers
UNION ALL
SELECT 2 AS id, [20,30,40,50] AS some_numbers
UNION ALL
SELECT 3 AS id, [40,50,60,70] AS some_numbers
)
SELECT id, some_numbers
FROM sequences
WHERE (
(
SELECT COUNT(1)
FROM UNNEST(some_numbers) s
WHERE s in (20,30)
) > 1
)
Any suggestions are appreciated.
Not much to suggest... Official docs suggest to use exists:
WHERE EXISTS (SELECT *
FROM UNNEST(some_numbers) AS s
WHERE s in (20,30));
Assuming you are looking for rows where ALL elements in match array [20, 30] are found in target array (some_numbers). Also assuming no duplicate numbers in both (match and target) arrays
select id, some_numbers
from sequences a,
unnest([struct([20, 30] as match)]) b
where (
select count(1) = array_length(match)
from a.some_numbers num
join b.match num
using(num)
)

How can I select the value of the nth row in a column in Powerbi?

I am fairly new to PowerBi,
I am trying to select the top 3 values from a table but only use specific column values. My instinct is to create a measure for each row. Here is a sample table of the data.
I've tried this but don't know enough of DAX to know how to go any further. I am able to select the top 1 value, but if I change N to 3 it doesnt work. Even if I can choose the second value instead of just the first one would help. Some sort of row index or number in an array fashion.
row[1][name]
LowestSpenders =
"The lowest spenders for the day are "
&
CALCULATE(
VALUES(Top3Low[Name]),
TOPN(1, Top3Low, Top3Low[Spent], DESC)
)
I have also tried this
LowestSpenders =
"The lowest spenders for the day are "
&
CONCATENATEX(
Top3Lost,
VALUES(Top3Lost[ClientName]),
",",
Top3Lost[LastYear],
DESC
)
I want to select the names of the people based on their positions in the table and include them in a dynamic text measure.
Something like this.
"The lowest spenders for the day are: Bob, John and Mark"
How about something like this? Rank all the names and then pick out whatever ranks you want.
LowestSpenders =
VAR Summary =
SUMMARIZE (
Top3Low,
Top3Low[Name],
"Rank", RANK.EQ ( Top3Low[Spent], Top3Low[Spent], 1 )
)
RETURN
CONCATENATEX (
FILTER ( Summary, [Rank] IN { 1, 2, 3 } ),
[Name],
", "
)
Instead of [Rank] IN { 1, 2, 3 }, you can substitute whatever criterion you want, for example, [Rank] = 2 or [Rank] > 2 && [Rank] < 5.
You're nearly there.
Use TOPN to identify the lowest n spenders, and use CONCATENATEX to iterate over this table and concatenate the names:
LowestSpenders =
CONCATENATEX (
TOPN (
3,
MyTable,
MyTable[Spent],
ASC
),
MyTable[Name],
", "
)

TopN, Grouping, Show Others at the bottom POWERBI-DAX

I have the following formula which creates the table in the screenshot below on the left (names of actual tables are different - also it combines 2 separate tables in one) -
Top 11 Jun =
IF (
[Type Rank Jun] <= 11,
[Total Jun],
IF (
HASONEVALUE ( Partners[partner_group] ),
IF (
VALUES ( Partners[partner_group] ) = "Others",
SUMX (
FILTER ( ALL ( Partners[partner_group] ), [Type Rank Jun] > 11 ),
[Total Jun]
)
)
)
)
Now i'm stuck on how to combine the "Null" and "Others" under "Others" and put "Others" at the bottom.i can combine the "Null" & "Others" at each table level, i'm just not sure how.
The DAX solution:
To get the Other and blank (at least that is how I read your null) together, you can create a new column on the table (is easiest).
newProducts = IF(fruits[product] = BLANK(); "Other";fruits[product])
A better solution is to replace your blanks (or NULL) in the Query language:
Go to: Edit Query:
Select your table and the product column and press on the bar the "Replace values"
Do the replace and save and close the editor.
Last step
It is not relevant in which order you have the rows in the table because you can control this in the visual self.
Below example:
As you can see, I filtered other out, this is not needed when you want to count them in your top N.
If you want to show all four, we need to make a new Table:
Tabel =
var Top3 = TOPN(3;FILTER(fruits;fruits[product] <> "Other") ;fruits[July Avail])
var prioTop3 = ADDCOLUMNS(Top3;"Order"; CALCULATE(COUNTROWS(fruits);FILTER(Top3; fruits[July Avail] <= EARLIER(fruits[July Avail]))))
var Other = ADDCOLUMNS(GROUPBY(FILTER(fruits;fruits[product] = "Other");fruits[product];"June Avail"; SUMX(CURRENTGROUP();fruits[June Avail]); "July Avail"; SUMX(CURRENTGROUP();fruits[July Avail]));"Order";0)
return UNION(prioTop3; Other)
Result:

Regular expression to remove duplicates from comma separated string

I have following string:
'C,2,1,2,3,1'
I need a regular expression to remove duplicates and the result string should be like this:
'C,2,1,3'
If your input data is more than one string, I assume there is some kind of id column you can use to distinguish the strings from each other. If no such column exists, it can be created in the first factored subquery, for example by using rownum.
with
inputs ( id, str ) as (
select 1, 'C,2,1,2,3,1' from dual union all
select 2, 'A,ZZ,3,A,3,ZZ' from dual
),
unwrapped ( id, str, lvl, token ) as (
select id, str, level, regexp_substr(str, '[^,]+', 1, level)
from inputs
connect by level <= 1 + regexp_count(str, ',')
and prior id = id
and prior sys_guid() is not null
),
with_rn ( id, str, lvl, token, rn ) as (
select id, str, lvl, token, row_number() over (partition by id, token order by lvl)
from unwrapped
)
select id, str, listagg(token, ',') within group (order by lvl) as new_str
from with_rn
where rn = 1
group by id, str
order by id
;
ID STR NEW_STR
---- ------------------ --------------------
1 C,2,1,2,3,1 C,2,1,3
2 A,ZZ,3,A,3,ZZ A,ZZ,3
Try this:
with
-- your input data
t_in as (select 'C,2,1,2,3,1' as s from dual),
-- your string splitted into a table, a row per list item
t_split as (
select (regexp_substr(s,'(\w+)(,|$)',1,rownum,'c',1)) s,
level n
from t_in
connect by level <= regexp_count(s,'(\w+)(,|$)') + 1
),
-- this table grouped to obtain distinct values with
-- minimum levels for sorting
t_grouped as (
select s, min(n) n from t_split group by s
)
select listagg(s, ',') within group (order by n)
from t_grouped;
Depending on your Oracle version you might have to replace listagg with wm_concat (it's googlable)
Here another shorter solution:
select listagg(val, ',') within group(order by min(id))
from (select rownum as id,
trim(regexp_substr(str, '[^,]+', 1, level)) as val
from (select 'C,2,1,2,3,1' as str from dual)
connect by regexp_substr(str, '[^,]+', 1, level) is not null)
group by val;