Let's assume I have access to this data in QuickSight :
Id Amount Date
1 10 15-01-2019
2 0 16-01-2020
3 100 21-12-2019
4 34 15-01-2020
5 5 20-02-2020
1 50 13-09-2020
4 0 01-01-2020
I would like to create a boolean calculated field, named "Amount_in_2020", whose value is True when the Id have a total strictly positive Amount in 2020, else False.
With python I would have done the following :
# Sample data
df = pd.DataFrame({'Id' : [1,2,3,4,5,1,4],
'Amount' : [10,0,100,34,5,50,0],
'Date' : ['15-01-2019','16-01-2020','21-12-2019','15-01-2020','20-02-2020','13-09-2020','01-01-2020']})
df['Date']=df['Date'].astype('datetime64')
# Group by to get total amount and filter dates
df_gb = pd.DataFrame(df[(df["Date"]>="01-01-2020") & (df["Date"]<="31-12-2020")].groupby(by=["Id"]).sum()["Amount"])
# Creation of the wanted column
df["Amount_in_2020"]=np.where(df["Id"].isin(list(df_gb[df_gb["Amount"]>0].index)),True,False)
But I can't find a way to create such a calculated field in Quicksight. Could you please help me ?
Expected output :
Id Amount Date Amount_in_2020
1 10 2019-01-15 True
2 0 2020-01-16 False
3 100 2019-12-21 False
4 34 2020-01-15 True
5 5 2020-02-20 True
1 50 2020-09-13 True
4 0 2020-01-01 True
Finally found :
ifelse(sumOver(max(ifelse(extract("YYYY",{Date})=2020,{Amount},0)), [{Id}])>0,1,0)
Related
I have a table with the columns as below -
There are rows showing the allocation of various people under various project. The months (columns) can extend to Dec,20 and continue on from Jan,21 in the same pattern as above.
One Staff can be tagged to any number of projects in a given month.
Now I want to prepare a Power BI report on this in the format as below -
Staff ID, Project ID and End Date are the slicers to be present.
For the End Date slicer we can select options in the format of (MMM,YY) (eg - Jan,23). On the basis of this slicer I want to show the preceding 6 months of data, as portrayed by the above sample image.
I have tried using parameters but those have to specified for each combination so not usable for this data as this is going to increase over time.
Is there any way to do this or am I missing some simple thing in particular?
Any help on this will be highly appreciated.
Adding in the sample data as below -
Staff ID
Project ID
Jan,20
Feb,20
Mar,20
Apr,20
May,20
Jun,20
Jul,20
1
20
0
0
0
100
80
10
0
1
30
0
0
0
0
20
90
100
2
20
100
100
100
0
0
0
0
3
50
80
100
0
0
0
0
0
3
60
15
0
0
0
20
0
0
3
70
5
0
100
100
80
0
0
I have a table (SalesTable) with list of customers and dates of orders I received from them. I also created a table with Calendarauto function called 'Calendar'.
What I would like to do is to add a measure so to add value 1 to all those orders that were placed before a specific date and and
at the same customers who did not place ANY other order after that date
Measure =
IF(
SELECTEDVALUE('SalesTable'[SalesDate])<MIN(Calendar[Date])||
SELECTEDVALUE('SalesTabl'[SalesDate])>MAX(Calendar[Date]),
1,0
)
but this shows me in fact only orders that were placed before MIN(Calendar[Date] but does not excludes those customers who did not place any other order after that MIN(Calendar[Date]
This MIN(Calendar[Date] is controlled by slicer
Anyone could help me to modify this?
and here my sample data:
Customers Order no. Dates of Orders Expected Results
Customer A 1 01.01.2023 1
Customer A 2 02.01.2023 1
Customer E 3 03.01.2023 1
Customer E 4 04.01.2023 1
Customer E 5 05.01.2023 1
Customer C 6 06.01.2023 0
Customer C 7 07.01.2023 0
Customer C 8 08.01.2023 0
Customer B 9 09.01.2023 0
Customer B 10 10.01.2023 0
Customer B 11 11.01.2023 0
Customer D 12 12.01.2023 0
Customer C 13 13.01.2023 0
Customer D 14 14.01.2023 0
Customer C 15 15.01.2023 0
and here is bascially how my power BI page looks like as an example, the aboe slicer should control what is being shown in matrix below it
So let's take 09.01.2023 as a reference. I would like to add value = 1 to csutomers A and E because they did buy sth before 09.01.2023 but did not buy anything after 09.01.2023 and would like to add value = 0 to the rest customers since they did buy sth after 09.01.2023
I have randomly missing categories in a Stata dataset that look like the following
omb_control_number agency hours
1 HHS-ACF
1 10
2
2
2 HHS-CDC 2
3
3 HHS-ACF 3
3
4 HHS-ACF 10
4
4
4
The omb_control_number variable is constant throughout the data is not missing. I am trying to impute the categories such that all unique omb_control_number have the same agency and hours. I tried using the following:
by omb_control_number, sort : replace agency[_n-1] if missing(agency)
But it filled in only previous values. Is there a way to do this where it won't just fill in previous values? For reference, the final dataset should look like the following:
omb_control_number agency hours
1 HHS-ACF 10
1 HHS-ACF 10
2 HHS-CDC 2
2 HHS-CDC 2
2 HHS-CDC 2
3 HHS-ACF 3
3 HHS-ACF 3
3 HHS-ACF 3
4 HHS-ACF 10
4 HHS-ACF 10
4 HHS-ACF 10
4 HHS-ACF 10
If you do not care about maintaining original sort order, then you can do this:
* Example generated by -dataex-. For more info, type help dataex
clear
input byte omb_control_number str7 agency byte hours
1 "HHS-ACF" .
1 "" 10
2 "" .
2 "" .
2 "HHS-CDC" 2
3 "" .
3 "HHS-ACF" 3
3 "" .
4 "HHS-ACF" 10
4 "" .
4 "" .
4 "" .
end
gsort omb_control_number -agency
bys omb_control_number : replace agency = agency[_n-1] if missing(agency)
sort omb_control_number hours
bys omb_control_number : replace hours = hours[_n-1] if missing(hours)
If agency is a string variable, then
bysort omb (agency) : replace agency = agency[_N]
will copy the last value after sorting to all observations for the same group.
If agency is a numeric variable with value labels, keep reading.
As hours is presumably a numeric variable, it is the same idea with a twist:
bysort omb (hours) : replace hours = hours[1]
In neither case is there any check for two or more non-missing values for the same identifier.
For a numeric variable, whether with or without value labels, a check would be
bysort omb (hours) : gen byte OK = (hours == hours[1]) | missing(hours)
You should then want to look if any observations are 0 on OK. 1 means "OK".
And from the above string variables can be checked too, with a need to look in the last observation -- indexed by _N-- rather than the first -- indexed by 1.
This will get you the desired results:
bysort omb_control_number: gen nonmissing = sum(!missing(agency)) if !missing(agency)
bysort omb_control_number: gen nonmissing2 = sum(!missing(hours)) if !missing(hours)
bysort omb_control_number (nonmissing) : replace agency = agency[1]
bysort omb_control_number (nonmissing2) : replace hours = hours[1]
drop nonmissing*
I begin to use Power BI, and I don't know how to group lines.
I have this kind of data :
api user 01/07/21 02/07/21 03/07/21 ...
a 25 null 3 4
b 25 1 null 2
c 25 1 4 5
a 30 4 3 5
b 30 3 2 2
c 30 1 1 3
And I would like to have the sum of the values per user, not by api and user
user 01/07/21 02/07/21 03/07/21 ...
25 2 7 11
30 8 6 10
Do you know how to do it please ?
I created a table with your sample data (make sure your values are treated as numbers):
Then create a Matrix visual, with "user" in Rows and your desired columns in the Values section:
I have a data frame that looks like this:
id age sallary
1 16 500
2 21 1000
3 25 3000
4 30 6000
5 40 25000
and a list of ids that I would like to ignore [1,3,5]
how can I get a data frame that will contain all the remaining rows: 2,4.
Big thanks for every one.
Call isin and negate the result using ~:
In [42]:
ignore_ids=[1,3,5]
df[~df.id.isin(ignore_ids)]
Out[42]:
id age sallary
1 2 21 1000
3 4 30 6000