Power Bi Compare two tables and get values that do not matched criteria - powerbi

i have 2 tables and, i would like to check if table 1 (Type_Sorting) == (CCSClassCode_Type) is matched with table 2 (_Type Sorting) == (_CCS Class Type):
for example, you can see vi got the wrong value in table 1 (CCSClassCode_Type)
and, the right value is XLBas you can see in table 2 (_CCS Class Type) not ULM,
the idea of table 2 to check if people type the right values, Please not that table 2 (_CCS Class Type) have duplicate values
thank you in advance :)

You can calculate this like that:
Table 2 =
Var trt =
SELECTCOLUMNS(Table_2, "XX"
, COMBINEVALUES(",",Table_2[_CCS Class Type],Table_2[_Type Sorting]))
return
SUMMARIZECOLUMNS(Table_1[Column1]
, Table_1[CCSClassCode_Type]
, Table_1[Type_Sorting]
, FILTER(ALL(Table_1[CCSClassCode_Type],Table_1[Type_Sorting]), not( COMBINEVALUES(",",Table_1[CCSClassCode_Type],Table_1[Type_Sorting])
in trt )
))

Related

Filter data using IF Statement in Tableau

I have a data source in tableau that looks something similar to this:
SKU Backup_Storage
A 5
A 1
B 2
B 3
C 1
D 0
I'd like to create a calculated field in tableau that performs a SUM calculation IF the SKU column contains the string 'A' or 'D' , and to perform an AVERAGE calculation if the SKU column contains the letters 'C' or 'B'
This is what I am doing:
IF CONTAINS(ATTR([SKU]),'A') or
CONTAINS(ATTR([SKU]),'D')
THEN SUM([Backup_Storage])
ELSEIF CONTAINS(ATTR([SKU]),'B') or
CONTAINS(ATTR([SKU]),'C')
THEN AVG([Backup_Storage])
END
UPDATE - desired output would be:
SKU BACKUP
A, D 6 (This is the SUM OF A and D)
B, C 2 (This is the AVG of B and C)
The calculation above shows as valid, however, I see NULLS in my data source table.
Any suggestion is appreciated.
I have named the calculated field:
SKU_FILTER_CALCULATION
Basically, IF THEN ELSE condition works when one test that is either TRUE/FALSE. Your specified condition is not a proper use case of IF THEN ELSE because SKUs can take all possible values. See it like this..
your data
SKU Backup_Storage
A 5
A 1
B 2
B 3
C 1
D 0
Let's name your calc field as CF, then CF will take value A in first row and will output SUM(5) = 5. For second row it will output sum(1) = 1, for third and onward rows it will output as avg(2) = 2, avg(3) = 3, avg(1) and sum(0) respectively. all these values just equals [Backup_storage] only and I'm sure that this you're not trying to get.
If instead you are trying to get sum(5,1,0) + avg(2,3,1) (obviously i have assumed + here) which equals 8 i.e. one single value for whole dataset, please proceed with this calculated field..
SUM(IF CONTAINS([SKU], 'A') OR CONTAINS([SKU], 'D')
THEN [Backup storage] END)
+
AVG(IF CONTAINS([SKU], 'B') OR CONTAINS([SKU], 'C')
THEN [Backup storage] END)
This will return an 8 when put to view
Needless to say, if you want any other operator instead of + you have to change that in CF accordingly
As per your edited post, I suggest a different methodology. Create diff groups where you want to perform different aggregations
Step-1 Create groups on SKU field. I have named this group as SKUG
Step-2 create a calculated field CF as
SUM(ZN(IF CONTAINS([SKU], 'A') OR CONTAINS([SKU], 'D')
THEN [Backup storage] END))
+
AVG(ZN(IF CONTAINS([SKU], 'B') OR CONTAINS([SKU], 'C')
THEN [Backup storage] END))
Step-3 get your desired view
Good Luck

Big query analytical function not giving expected results

I am trying to write a sql in bigquery and I have a requirement to filter records based on a group by column and another column in the table
what I mean is I want to check if the group by column(column name:mnt) has more than one row then I have to check if col2 (col name: zel) value, then I have to apply a filter saying col2 ='X' and only pass that record else pass i.e dont filter the records if the col1 has only distinct one value per group
So I have written a sql to do this I have used row_number as well as rank , dense rank function but I noticed the value of rank and dense rank and row number functions return same value for a group
Please see the below code
#standardsql
with t1 as (SELECT mnt,
case when rank() over (partition by ltrim(rtrim(mnt)) order by
ltrim(rtrim(mnt)) asc) >1 then 'Y' else 'N' end
as flag,
rank() over (partition by mnt order by mnt) as rn,
dense_rank() over (partition by mnt order by mnt) as drn, FROM
projectname.datasetname.tablename1),
t2 as ( SELECT
mnt,
rel,
lif,
lts,
lokez FROM projectname.datasetname.tablename2
WHERE lts <> "" AND _PARTITIONTIME = TIMESTAMP(CURRENT_DATE()) ) ,
t3 as (SELECT
lif,
lifn,
lts,
par FROM `projectname.datasetname.tablename3`)
,t4 as (SELECT rcv FROM `projectname.datasetname.tablename4` WHERE mes
= 'PRO')
select * from (
SELECT t1.mnt as mnt,
t1.flag,
t1.rn,
t1.drn
t2.rel as zel,
t2.lokez as ZLOEKZ,
t4.rcv as Zrcv
FROM t1 left join t2 on replace(t1.mnt, '00000000', '') =
REPLACE(t2.mnt, '00000000', '') AND t1.lif = t2.lif and t2.lts <> ""
and
case when t1.flag = 'Y' and t2.rel ='X' then 1
when (t1.flag ='N' and t2.rel=t2.rel) or (t1.flag ='N' and t2.rel
is null) then 1
when t1.flag = 'Y' and t2.rel <>'X' then 2
else 3
end = 1
left join t3 ON t1.lif = t3.lif AND t2.lts = t3.lts AND
t3.par = 'BA' left join t4 on t4.rcv = t3.lifn and t2.lokez is null )
where ZLOEKZ is null order by mnt
As you can see I am using a case statement and even it seems to be not working fine. I am pasting the case condition below again
case when t1.flag = 'Y' and t2.rel ='X' then 1
when (t1.flag ='N' and t2.rel=t2.rel) or (t1.flag ='N' and
t2.rel
is null) then 1
when t1.flag = 'Y' and t2.rel <>'X' then 2
else 3
end = 1
But the expected record count did not match so I added the above sql lines to see if my analytical functions were giving me result I wanted
rank() over (partition by mnt order by mnt) as rn,
dense_rank() over (partition by mnt order by mnt) as drn
strangely for same mnt number the rank , dense rank and row_number function are assigning the same value what am i doing wrong here.
mnt flag rn drn rel lokez rcv
100 N 1 1 X abc 123
100 N 1 1 null xyz 123
100 N 1 1 null def 234
This is my output
I mean as per my code for same mnt number I am seeing flag set to N instead of Y and for the rank and dense rank are giving me same number for all 3 mnt it is generating 1 instead of 123 (note for rank function I understand) but dense rank should not do that
I tried to convey the issue as efficiently as I could please let me know if there is any clarifications I can provide.
any help appreciated
thanks
SELECT * EXCEPT(ct) FROM (
SELECT *, COUNT() OVER(PARTITION BY mnt) AS ct
) WHERE ct=1 or zel='X'
This is the code snippet for the problem you mentioned. Use this in your code according to the logic.

How to remove null rows from MDX query results

How can I remove the null row from my MDX query results?
Here is the query I'm currently working with
select
non empty
{
[Measures].[Average Trips Per Day]
,[Measures].[Calories Burned]
,[Measures].[Carbon Offset]
,[Measures].[Median Distance]
,[Measures].[Median Duration]
,[Measures].[Rider Trips]
,[Measures].[Rides Per Bike Per Day]
,[Measures].[Total Distance]
,[Measures].[Total Riders]
,[Measures].[Total Trip Duration in Minutes]
,[Measures].[Total Members]
} on columns
,
non empty
{
(
[Promotion].[Promotion Code Name].children
)
} on rows
from [BCycle]
where ([Program].[Program Name].&[Madison B-cycle])
;results
This is not a null value however it is one of the children of [Promotion].[Promotion Code Name].Children.
You can exclude that particular value from children using the EXCEPT keyword of MDx.
Example query:
//This query shows the number of orders for all products,
//with the exception of Components, which are not
//sold.
SELECT
[Date].[Month of Year].Children ON COLUMNS,
Except
([Product].[Product Categories].[All].Children ,
{[Product].[Product Categories].[Components]}
) ON ROWS
FROM
[Adventure Works]
WHERE
([Measures].[Order Quantity])
Reference -> https://learn.microsoft.com/en-us/sql/mdx/except-mdx-function?view=sql-server-2017

Access the previous record to compare the value in DAX POWER BI

I need to access the previous record of the DTH_REFER_PEDID column to make the IF comparison (DTH_REFER_PEDID-1 <> "A").
That is, I'm reading the index X, I need to compare with the index X-1
Addition_Stats = VAR Atendido_OV = PR_HIST_MOVIM_PEDID[OVITEM_Hist]
VAR linha_anterior2 = CALCULATE(values(PR_HIST_MOVIM_PEDID[STA_ITEM_PEDCL]);filter(PR_HIST_MOVIM_PEDID;EARLIER(PR_HIST_MOVIM_PEDID[DTH_REFER_PEDID])))
Return
if(PR_HIST_MOVIM_PEDID[DTH_REFER_PEDID].[Month]<PR_HIST_MOVIM_PEDID[DAT_MAIOR_PLANE].[Month];"Atraso mês ant";
if(PR_HIST_MOVIM_PEDID[STA_ITEM_PEDCL] = "A" && PR_HIST_MOVIM_PEDID[DTH_REFER_PEDID].[Day]<=PR_HIST_MOVIM_PEDID[DAT_MAIOR_PLANE].[Day];"Atendido no Prazo";
if((PR_HIST_MOVIM_PEDID[STA_ITEM_PEDCL]="P"||PR_HIST_MOVIM_PEDID[STA_ITEM_PEDCL]="L") && PR_HIST_MOVIM_PEDID[DTH_REFER_PEDID].[Day]<= PR_HIST_MOVIM_PEDID[DAT_MAIOR_PLANE].[Day];"Planejado no prazo";
if(PR_HIST_MOVIM_PEDID[STA_ITEM_PEDCL]<>"A" && PR_HIST_MOVIM_PEDID[DTH_REFER_PEDID].[Day]>PR_HIST_MOVIM_PEDID[DAT_MAIOR_PLANE].[Day];"Em atraso";
if(PR_HIST_MOVIM_PEDID[STA_ITEM_PEDCL] = "A"
&& linha_anterior2 <>"A"
&& PR_HIST_MOVIM_PEDID[DTH_REFER_PEDID].[Day]>PR_HIST_MOVIM_PEDID[DAT_MAIOR_PLANE].[Day];"Atend fora Prazo"
;IF((PR_HIST_MOVIM_PEDID[OVITEM_Hist]=Atendido_OV)&&(PR_HIST_MOVIM_PEDID[DTH_REFER_PEDID]>FIRSTDATE(PR_HIST_MOVIM_PEDID[DTH_REFER_PEDID].[Date]));"A retido";"NA")
)
)
)
)
)
//)
The error displayed is: A circular dependency has been detected: PR_HIST_MOVIM_PEDID [Addition_Stats].
How do I compare DTH_REFER_PEDID-1 <> "A"?
An easy way to work with previous or next records is:
Make sure your data is in a table with a primary key (=ID)
Make a query with all the fields as in your table and add one colum with ID+1. (or ID-1)
Make another query with the table and the query mentioned above and make a join between ID and ID+1 (or ID-1). Place all the fields of the table and the 1st query and you end up with all the values in 1 record. This way you can work with the previous or next values.

Split Pandas Column by values that are in a list

I have three lists that look like this:
age = ['51+', '21-30', '41-50', '31-40', '<21']
cluster = ['notarget', 'cluster3', 'allclusters', 'cluster1', 'cluster2']
device = ['htc_one_2gb','iphone_6/6+_at&t','iphone_6/6+_vzn','iphone_6/6+_all_other_devices','htc_one_2gb_limited_time_offer','nokia_lumia_v3','iphone5s','htc_one_1gb','nokia_lumia_v3_more_everything']
I also have column in a df that looks like this:
campaign_name
0 notarget_<21_nokia_lumia_v3
1 htc_one_1gb_21-30_notarget
2 41-50_htc_one_2gb_cluster3
3 <21_htc_one_2gb_limited_time_offer_notarget
4 51+_cluster3_iphone_6/6+_all_other_devices
I want to split the column into three separate columns based on the values in the above lists. Like so:
age cluster device
0 <21 notarget nokia_lumia_v3
1 21-30 notarget htc_one_1gb
2 41-50 cluster3 htc_one_2gb
3 <21 notarget htc_one_2gb_limited_time_offer
4 51+ cluster3 iphone_6/6+_all_other_devices
First thought was to do a simple test like this:
ages_list = []
for i in ages:
if i in df['campaign_name'][0]:
ages_list.append(i)
print ages_list
>>> ['<21']
I was then going to convert ages_list to a series and combine it with the remaining two to get the end result above but i assume there is a more pythonic way of doing it?
the idea behind this is that you'll create a regular expression based on the values you already have , for example if you want to build a regular expressions that capture any value from your age list you may do something like this '|'.join(age) and so on for all the values you already have cluster & device.
a special case for device list becuase it contains + sign that will conflict with the regex ( because + means one or more when it comes to regex ) so we can fix this issue by replacing any value of + with \+ , so this mean I want to capture literally +
df = pd.DataFrame({'campaign_name' : ['notarget_<21_nokia_lumia_v3' , 'htc_one_1gb_21-30_notarget' , '41-50_htc_one_2gb_cluster3' , '<21_htc_one_2gb_limited_time_offer_notarget' , '51+_cluster3_iphone_6/6+_all_other_devices'] })
def split_df(df):
campaign_name = df['campaign_name']
df['age'] = re.findall('|'.join(age) , campaign_name)[0]
df['cluster'] = re.findall('|'.join(cluster) , campaign_name)[0]
df['device'] = re.findall('|'.join([x.replace('+' , '\+') for x in device ]) , campaign_name)[0]
return df
df.apply(split_df, axis = 1 )
if you want to drop the original column you can do this
df.apply(split_df, axis = 1 ).drop( 'campaign_name', axis = 1)
Here I'm assuming that a value must be matched by regex but if this is not the case you can do your checks , you got the idea