How to plot % Running Difference in Google Data Studio? - google-cloud-platform

I'm creating charts in Google Data Studio, but unable to perform row wise aggregations. How would I plot bar/line charts of " % diff " of a specific column/value.
% diff = ((Today - Previous day) / (Today)) X 100
for example, I have table as shown below with two column DATE and VALUE and I want to plot date wise % Diff of VALUE, i.e. for 02-01-2020 i want to plot (900-400)/900 * 100= 55 % , for 03-01-2020 i want to plot (200-900)/200 * 100= -350 %, .... and similarly for all the dates
DATE VALUE
01-01-2020 400
02-01-2020 900
03-01-2020 200
04-01-2020 700
05-01-2020 400
06-01-2020 800
07-01-2020 900
08-01-2020 500
09-01-2020 800
10-01-2020 900
11-01-2020 600
12-01-2020 400
13-01-2020 400
14-01-2020 200
15-01-2020 300

Related

DAX formula providing the previous value of a variable, sorted by a different variable

I have two tables, Table1 and Table2, containing the following information:
Table1: Sales
Date Firm A Firm B Firm C
30-05-2022 100 200 300
29-05-2022
28-05-2022
27-05-2022 130 230 330
26-05-2022 140 240 340
25-05-2022 150 250 350
and
Table2: Dates
Relative day Date
1 30-05-2022
2 27-05-2022
3 26-05-2022
4 25-05-2022
In my Power BI (PBI) desktop a slicer, allowing the user to select from a range of Relative days (i.e. number of business days from today's date), is present.
What I want is to create a new measure, Sales lag, that contains the lagged value of sales for the individual firm, and for which the lagged value is based on the Relative day variable e.g.:
For the slicer set according to Relative day=1
Sales Sales lag
Firm A 100 130
Firm B 200 230
Firm C 300 330
Please note that I (think I) need the measure to be based on the relative day variable, as the Date variable does not take into account business days.
I previously used a measure that I think was similar to:
Sales lag =calculate(sum(Table1[Sales],dateadd('Table2'[Date],-1,day))
While this measure provided the correct results most of the time, it did not in the presence of weekends.
Thank you
I used some sample data structured slightly differently than you provide and took a stab at providing you with a solution. Find the sample data I used below and the Sales lag measure in the Solution Measure section.
Table1
Date
Firm
Sales
30-05-2022
A
100
29-05-2022
A
110
28-05-2022
A
120
27-05-2022
A
130
26-05-2022
A
140
25-05-2022
A
150
30-05-2022
B
200
29-05-2022
B
210
28-05-2022
B
220
27-05-2022
B
230
26-05-2022
B
240
25-05-2022
B
250
30-05-2022
C
300
29-05-2022
C
310
28-05-2022
C
320
27-05-2022
C
330
26-05-2022
C
340
25-05-2022
C
350
Table2
Relative day
Date
1
30-05-2022
2
27-05-2022
3
26-05-2022
4
25-05-2022
Solution Measure
Sales lag =
VAR sel = SELECTEDVALUE(Table2[Relative day])
VAR dat = CALCULATE(MAX(Table2[Date]), Table2[Relative day] = sel + 1)
RETURN
IF(
ISBLANK(sel),
"",
CALCULATE(
MAX(Table1[Sales]),
ALL(Table2[Relative day]),
Table1[Date] = dat
)
)
Sample Result

Populate df row value based on column header

Appreciate any help. Basically, I have a poor data set and am trying to make it more useful.
Below is a representation
df = pd.DataFrame({'State': ("Texas","California","Florida"),
'Q1 Computer Sales': (100,200,300),
'Q1 Phone Sales': (400,500,600),
'Q1 Backpack Sales': (700,800,900),
'Q2 Computer Sales': (200,200,300),
'Q2 Phone Sales': (500,500,600),
'Q2 Backpack Sales': (800,800,900)})
I would like to have a df that creates separate columns for the Quarters and Sales for the respective state.
I think perhaps regex, str.contains, and loops perhaps?
snapshot below
IIUC, you can use:
df_a = df.set_index('State')
df_a.columns = pd.MultiIndex.from_arrays(zip(*df_a.columns.str.split(' ', n=1)))
df_a.stack(0).reset_index()
Output:
State level_1 Backpack Sales Computer Sales Phone Sales
0 Texas Q1 700 100 400
1 Texas Q2 800 200 500
2 California Q1 800 200 500
3 California Q2 800 200 500
4 Florida Q1 900 300 600
5 Florida Q2 900 300 600
Or we can go further:
df_a = df.set_index('State')
df_a.columns = pd.MultiIndex.from_arrays(zip(*df_a.columns.str.split(' ', n=1)), names=['Quarters','Items'])
df_a = df_a.stack(0).reset_index()
df_a['Quarters'] = df_a['Quarters'].str.extract('(\d+)')
print(df_a)
Output:
Items State Quarters Backpack Sales Computer Sales Phone Sales
0 Texas 1 700 100 400
1 Texas 2 800 200 500
2 California 1 800 200 500
3 California 2 800 200 500
4 Florida 1 900 300 600
5 Florida 2 900 300 600

Django ORM QUERY Adjacent row sum with sqlite

In my database I'm storing data as below:
id amt
-- -------
1 100
2 -50
3 100
4 -100
5 200
I want to get output like below
id amt balance
-- ----- -------
1 100 100
2 -50 50
3 100 150
4 -100 50
5 200 250
How to do with in django orm

PowerBI - Average and Variance Calculation with conditions

I am trying to calculate Variance and Average in PowerBI. I am running into Circular dependency errors.
This is my Data,
Month Year Item Count
1 2017 Chair 100
1 2017 Chair 200
1 2017 Chair 300
1 2017 Bench 110
1 2017 Bench 140
1 2017 Bench 150
2 2017 Chair 180
2 2017 Chair 190
2 2017 Chair 250
2 2017 Bench 270
2 2017 Bench 370
3 2017 Chair 120
3 2017 Chair 150
3 2017 Bench 180
3 2017 Bench 190
4 2017 Chair 200
4 2017 Chair 210
4 2017 Bench 220
4 2017 Bench 230
.
.
.
Average = Sum of Counts for the Previous 3 months / 3
Variance = (Average - Sum(CurrentMonth)) / Average
So, because the average won't be meaningful for the first 3 months, I wouldn't be worried about that.
Expected Output,
Month Year Item Sum(CurrentMonth) Average Variance
1
1
2
2
3
3
4 2017 Chair 410 497 0.21
4 2017 Bench x y z
Lets Say for Chair,
Sum of Current Month = 200 + 210 = 410
Average of Last 3 Months = (100 + 200 + 300 + 180 + 190 + 250 + 120 + 150 )/ 3 = 1490 / 3 = 497
Variance = (497 - 410) / 410 = 87 / 410 = 0.21
Kindly share your thoughts.
I started with this as Table1 (I added a couple months data to yours):
I loaded it into Power BI and added a column called "YearMonth" using this code: YearMonth = Table1[Year]&FORMAT(Table1[Month],"00") ...to get this:
Then I added another column called "Sum(CurrentMonth)" using this code: Sum(CurrentMonth) = SUMX(FILTER(FILTER(Table1,Table1[Item]=EARLIER(Table1[Item])),VALUE(Table1[YearMonth])=VALUE(EARLIER(Table1[YearMonth]))),Table1[Count]) ...to get this:
Then I added another column called "Average" using this code: Average = SUMX(FILTER(FILTER(FILTER(Table1,Table1[Item]=EARLIER(Table1[Item])),VALUE(Table1[YearMonth])<=VALUE(EARLIER(Table1[YearMonth]))-1),VALUE(Table1[YearMonth])>=VALUE(EARLIER(Table1[YearMonth]))-3),Table1[Count])/3 ...to get this:
Lastly, I added a column called "Variance" using this code: Variance = (Table1[Average]-Table1[Sum(CurrentMonth)])/Table1[Sum(CurrentMonth)] ...to get this:
I hope this helps you.

sas - group by two diff columns

I've the below dataset as input
ID category cust.nr cust.name income
1 a 100 Crosbie 5000
2 a 200 Heier 5500
2 a 300 Pick 5500
3 a 400 Sandridge 5100
4 b 500 Groesbeck 10000
4 b 600 Hayton 11000
4 b 700 Razor 12000
5 c 800 Lamere 90000
I need a report (f.ex using proc tabulate) as follows
In the data, cust.nr are unique but all the customers belonging to one family are given same ID, and customers are categorized based on their income.
<10000 as a
10000 to 15000 as b
'>'15000 as c
I need a report with
count of unique IDs(families), grouped by categories, and also rest of the columns need to be shown in the report.
so, it should look like
count_ID category cust.nr cust.name income
-------- ------ 100 Crosbie 5000
-------- ------ 200 Heier 5500
3 a 300 Pick 5500
-------- ------ 400 Sandridge 5100
-------- ------ 500 Groesbeck 10000
1 b 600 Hayton 11000
-------- -------- 700 Razor 12000
1 c 800 Lamere 90000
Any suggestions please..
You can do this easily with proc sql:
proc sql noprint;
create table results as
select category,
count(distinct id) as count_id
from mytable
group by 1
;
quit;