Annual moving average by quarter (Power Bi) - powerbi

I want to calculate the annual average of sales by quarter in Power Bi.
I usually solve this in Excel with an Averageif between dates (last date of a quarter and same date one year earlier).
Below a sample of my data:
Date Quarter Sale
1/12/2016 2016-Q1 12.5
2/25/2016 2016-Q1 65.1
4/7/2016 2016-Q2 95.5
6/22/2016 2016-Q2 74.5
7/10/2016 2016-Q3 7.3
8/30/2016 2016-Q3 87.6
9/5/2016 2016-Q3 88.4
10/27/2016 2016-Q4 18
11/12/2016 2016-Q4 64.2
12/29/2016 2016-Q4 37.2
1/28/2017 2017-Q1 17.8
3/8/2017 2017-Q1 59.6
4/16/2017 2017-Q2 68.7
6/15/2017 2017-Q2 68.5
7/20/2017 2017-Q3 61.8
8/7/2017 2017-Q3 10.8
9/23/2017 2017-Q3 26.5
10/7/2017 2017-Q4 49.8
11/26/2017 2017-Q4 79.7
12/3/2017 2017-Q4 80.5
1/18/2018 2018-Q1 12.5
3/19/2018 2018-Q1 54.7
4/12/2018 2018-Q2 64.0
6/19/2018 2018-Q2 58.9
7/29/2018 2018-Q3 59.9
8/9/2018 2018-Q3 4.1
9/13/2018 2018-Q3 20.2
The wanted result is the table below:
Quarter1_Yr_Trailing Avg Sale
2017-Q1 52.3
2017-Q2 56.7
2017-Q3 43.3
2017-Q4 52.4
2018-Q1 51.3
2018-Q2 49.9
2018-Q3 48.4
Just to clarify, 2018-Q3 is the average of any sale that was recorded between Sept 30, 2017 and Sept 30, 2018:
10/7/2017 2017-Q4 49.8
11/26/2017 2017-Q4 79.7
12/3/2017 2017-Q4 80.5
1/18/2018 2018-Q1 12.5
3/19/2018 2018-Q1 54.7
4/12/2018 2018-Q2 64.0
6/19/2018 2018-Q2 58.9
7/29/2018 2018-Q3 59.9
8/9/2018 2018-Q3 4.1
9/13/2018 2018-Q3 20.2
Thanks for your help.
regards,
Simon

Related

Rank a column according to the Filters selected by the user

I have data consisting of route details of the customers and also their store scores.
raw data with overall ranking for all the customers :
Dist_Code|Dist_Name|State|Store_name|Route_code|Store_score|Rank
5371 ABC Chicago CG 1200 5 1
2098 HGT Kansas KK 6500 4.8 2
7680 POE Arizona QW 3300 4.2 3
3476 POE Arizona CV 3300 4 4
6272 KUN Florida ANF 7800 3.9 5
3220 ABC Chicago AF 1200 3.6 6
7266 IOR Califor LU 4500 3.2 7
3789 POE Arizona TR 3300 3 8
9383 KAR Newyork IO 5600 3 9
1583 KUN Florida BOT 7800 2.8 10
8219 ABC Chicago Bb 1200 2.5 11
3734 ABC Chicago AA 1200 2 12
6900 POE Arizona HAL 3300 1.8 13
8454 KUN Florida UYO 7800 1.5 14
Filters
Distname ALL
State ALL
Routecode ALL
This is the overall ranking for all the customers without selecting any filters. So when I select some filter like (Dist name, route code, store score) I want it to show the rank according to the selected filter. Eg :
Dist_Code|Dist_Name|State|Store_name|Route_code|Store_score|Rank
7680 POE Arizona QW 3300 4.2 1
3476 POE Arizona CV 3300 4 2
3789 POE Arizona TR 3300 3 3
6900 POE Arizona HAL 3300 1.8 4
Filter
Distname POE
State Arizona
Routecode 3300
The store score is based on some parameter which I calculated in a model using python. 
Currently it is string column in powerbi. I tried some dax but it was not successful.

REGEX all invoice item descriptions

I'm trying to regex all items from an invoice (name, unit price, total, VAT, etc.). Managed to get all the information regarding digits, but biggest problem si to extract the item descriptions as sometimes it's on two separate lines. This is what I need to regex
1 Agrafe metalice Eco, rotunjite, 33 mm, 50 buc/cutie buc. 30.00 0,76 22,80 4,33
(SOBO604)
2 Banda corectoare DONAU Mouse, 5 mm x 8 m, orizontala, buc. 5.00 4,83 24,15 4,59
blister (7635001PL-99)
3 Biblioraft plastifiat OFFICE Products, 5 cm, colturi buc. 75.00 5,08 381,00 72,39
metalice, albastru (21011121-01)
4 Burete magnetic DONAU, 110 x 57 x 25 mm, galben buc. 10.00 5,53 55,30 10,51
(7638001PL-99)
5 Calculator de birou Canon WS-1610T, solar, 16 cifre, buc. 1.00 71,11 71,11 13,51
afisaz inclinat, format mare (WS1610T)
6 Capse zincate OFFICE Products 24/6, 1000 buc/cutie buc. 5.00 1,12 5,60 1,06
(18072419-19)
7 Creion grafic Eco, ascutit, cu radiera, corp verde buc. 20.00 0,40 8,00 1,52
(SOIS432)
8 Creion mecanic BIC Matic, 0.7 mm (601021) buc. 4.00 1,88 7,52 1,43
9 Dosar din plastic cu sina si doua perforatii OFFICE buc. 250.00 0,35 87,50 16,63
Products, albastru (21104211-01)
10 Dosar din plastic cu sina si doua perforatii OFFICE buc. 100.00 0,35 35,00 6,65
Products, roz (21104211-13)
pagina 1 / 3
797638
11 Folie protectie OFFICE Products, A4, coaja portocala, 40 buc. 5.00 6,53 32,65 6,20
microni, 100 file/set (21141215-90)
12 Folie protectie OFFICE Products, A4, coaja portocala, 40 buc. 20.00 6,51 130,20 24,74
microni, 100 file/set (21141215-90)
13 Marker whiteboard Eco, varf rotund, albastru (SOIS535A) buc. 104.00 1,33 138,32 26,28
14 Marker whiteboard Eco, varf rotund, negru (SOIS535N) buc. 2.00 1,33 2,66 0,51
15 Marker whiteboard Eco, varf rotund, rosu (SOIS535R) buc. 2.00 1,33 2,66 0,51
16 Notite adezive OFFICE Products, 51 x 76 mm, galben pal, buc. 5.00 1,65 8,25 1,57
100 file (14047511-06)
17 Organizator de birou DONAU Clasic VII, 6 compartimente, buc. 2.00 30,67 61,34 11,65
155 x 105 x 101 mm, transparent (7476001-99)
18 Panou din pluta Bi-Office, 60 x 90 cm, rama lemn buc. 1.00 32,96 32,96 6,26
(GMC070012010)
19 Pioneze color Eco, tinte pentru pluta , 40 buc/cutie buc. 1.00 2,16 2,16 0,41
(SOBO612)
20 Pix fara mecanism Eco, varf de 1 mm, albastru (SOIS405A) buc. 110.00 0,33 36,30 6,90
21 Plic C4 (229 x 324 mm), alb, siliconic, 10/set buc. 2.00 2,15 4,30 0,82
(15223619-14)
22 Tus pentru stampila Pelikan, cu picurator, 28 ml, negru buc. 1.00 6,93 6,93 1,32
(351197)
Notice that the item description sometimes is after the total price. Problem is that the space between items isn't even, it's variable, like for e.g. position 8 and 9 are almost linked, compared to position 20 and 21 which have a lot of space between them.
Somebody helped me and got only the first line using
\d{1,2}(.*)(\d+\.\d+\s+)(\d+\,\d+\s{0,1}){3}
this is where I got stuck because of the uneven syntax.
It only gets the first line. For e.g.:
'''
16 Notite adezive OFFICE Products, 51 x 76 mm, galben pal, buc. 5.00 1,65 8,25 1,57
100 file (14047511-06)
'''
it gest only 16 Notite adezive OFFICE Products, 51 x 76 mm, galben pal, buc. 5.00 1,65 8,25 1,57 but not 100 file (14047511-06). The complete invoice description is Notite adezive OFFICE Products, 51 x 76 mm, galben pal, 100 file (14047511-06) when transformed from pdf to text this is how I get the files.
Will need to extract also the last part and merge the first one to get the full item description.
Thank you
Try this regex:
\d{1,2}(.*)(\d+\.\d+\s+)(\d+\,\d+\s?){3}([\n ]+[^(\n]*\([^)]+\)(?=\n))?
Test on regex101

Able to transform observation to 0 but had issue with their total

I have a raw data set looks like this:
enter link description here
And I tried to transform the observations having a "T" to 0,
and then read in the data set and print out. Just this.
However, with my code, simply by looking at the first observation in line 5, it is apparently something is off.
For instance, the first observation for "Nov" should not be 0.
I could not figure out what had gone wrong and I wonder is anyone would like to give me some advice on what I can do for the next? Thank you very much! Highly appreciated.
My code is as below:
INFILE "&DIRLSB.Pr1Snowfall1.csv" DSD FIRSTOBS=5;
DROP i;
INPUT Season $#;
INPUT Year 1-4 Season 1-7 Sep Oct Nov Dec Jan Feb Mar Apr May Total;
ARRAY Months (*) Sep -- May;
DO i = 1 TO dim(Months);
IF Months(i)=. Then Months(i)=0;
END;
RUN;
I'm guessing you have a missing T; statement somewhere that is reading T(race) as missing T. ".T does not equal ."
I would use coalesce function. There is really no need to change missing T to 0 is there?
missing t;
data snow;
infile cards firstobs=2;
input Season:$7. Sep Oct Nov Dec Jan Feb Mar Apr May Total;
array mth[*] Sep--May;
do i = 1 to dim(mth);
mth[i] = coalesce(mth[i],0);
end;
t = sum(of mth[*]);
drop i;
cards;
Season Sep Oct Nov Dec Jan Feb Mar Apr May Total
1884-85 0 T 1 27.1 22.2 17 3.5 19.5 T 90.3
1885-86 0 1.7 8.2 8.4 16.9 16 6.5 7 0 64.7
1886-87 0 T 22.2 12.5 12 18.4 6.3 1.2 0 72.6
1893-94 0 0.5 6.1 27.6 20 29.5 5.4 13.3 0 102.4
1894-95 0 T 11.1 22.1 26.5 23.6 9.5 0.6 0 93.4
1895-96 0 1.5 5.9 8.7 22.5 39.1 45.1 1 0 123.8
1896-97 0 T 5.5 13.9 20.1 13.7 8.1 5.2 0 66.5
1897-98 0 0 10.1 18.4 32.1 26.8 1.2 2.4 0 91
1898-99 0 T 10.6 27 16.6 16.3 21.2 4.3 T 96
1899-00 T T 1.3 21.5 24.7 28.5 54 1.3 0 131.3
1906-07 0 5 5.7 18.7 11.7 15.7 3.1 2.5 1.3 63.7
1907-08 0 0 2.2 11.6 16.5 19.8 7.9 6.3 3 67.3
1908-09 0 0.5 4.6 10 22.5 6.1 9.7 9.8 3.3 66.5
1909-10 0 T 1.7 14.6 22 42.7 3.4 0.5 0 84.9
1910-11 0 2.2 15.7 29.8 9.5 30 13.5 4.7 2 107.4
1911-12 0 0 6.5 7.5 21.5 10.8 8.8 6.9 T 62
;;;;
run;
proc print;
run;

pandas selection from a specific year

I am trying to select the following data using pandas for Python 2.7 from the web page (http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/202_EN.htm) starting from the year 1991 to 2000. somebody please can help me how I can write the code. Thanks!
datum m_ta m_tax m_taxd m_tan m_tand
------- ----- ----- ---------- ----- ----------
1901-01 -4.7 5.0 1901-01-23 -12.2 1901-01-10
1901-02 -2.1 3.5 1901-02-06 -7.9 1901-02-15
1901-03 5.8 13.5 1901-03-20 0.6 1901-03-01
1901-04 11.6 18.2 1901-04-10 7.4 1901-04-23
1901-05 16.8 22.5 1901-05-31 12.2 1901-05-05
1901-06 21.0 24.8 1901-06-03 14.6 1901-06-17
1901-07 22.4 27.4 1901-07-30 16.9 1901-07-04
1901-08 20.7 25.9 1901-08-01 14.7 1901-08-29
1901-09 15.9 19.9 1901-09-01 11.8 1901-09-09
1901-10 12.6 17.9 1901-10-04 8.3 1901-10-31
1901-11 4.7 11.1 1901-11-14 -0.2 1901-11-26
1901-12 4.2 8.4 1901-12-22 -1.4 1901-12-07
1902-01 3.4 7.5 1902-01-25 -2.2 1902-01-15
1902-02 2.8 6.6 1902-02-09 -2.8 1902-02-06
1902-03 5.3 13.3 1902-03-22 -3.5 1902-03-13
1902-04 10.5 15.8 1902-04-21 6.1 1902-04-08
1902-05 12.5 20.6 1902-05-31 8.5 1902-05-10
1902-06 18.5 23.8 1902-06-30 14.4 1902-06-19
....
You can use df.year with boolean indexing for selecting data by column datum:
#convert column datum to period
df['datum'] = pd.to_datetime(df['datum']).dt.to_period('M')
#convert columns to datetime
df['m_taxd'] = pd.to_datetime(df['m_taxd'])
df['m_tand'] = pd.to_datetime(df['m_tand'])
print df.datum.dt.year
0 1901
1 1901
2 1901
3 1901
4 1901
5 1901
6 1901
7 1901
8 1901
9 1901
10 1901
11 1901
12 1902
13 1902
14 1902
15 1902
16 1902
17 1902
Name: datum, dtype: int64
#change 1901 to 2000
print df[df.datum.dt.year <= 1901]
datum m_ta m_tax m_taxd m_tan m_tand
0 1901-01 -4.7 5.0 1901-01-23 -12.2 1901-01-10
1 1901-02 -2.1 3.5 1901-02-06 -7.9 1901-02-15
2 1901-03 5.8 13.5 1901-03-20 0.6 1901-03-01
3 1901-04 11.6 18.2 1901-04-10 7.4 1901-04-23
4 1901-05 16.8 22.5 1901-05-31 12.2 1901-05-05
5 1901-06 21.0 24.8 1901-06-03 14.6 1901-06-17
6 1901-07 22.4 27.4 1901-07-30 16.9 1901-07-04
7 1901-08 20.7 25.9 1901-08-01 14.7 1901-08-29
8 1901-09 15.9 19.9 1901-09-01 11.8 1901-09-09
9 1901-10 12.6 17.9 1901-10-04 8.3 1901-10-31
10 1901-11 4.7 11.1 1901-11-14 -0.2 1901-11-26
11 1901-12 4.2 8.4 1901-12-22 -1.4 1901-12-07

percentage python code for data analysis using pandas

I want to write Python code to analyze the percentage of m_tax and m_tan for Python 2.7 from the web page (http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/202_EN.htm). I have already the dataframe code, but I couldn't write percentage code. Could somebody please help me how I can write the code. Thanks!
datum m_ta m_tax m_taxd m_tan m_tand
------- ----- ----- ---------- ----- ----------
1901-01 -4.7 5.0 1901-01-23 -12.2 1901-01-10
1901-02 -2.1 3.5 1901-02-06 -7.9 1901-02-15
1901-03 5.8 13.5 1901-03-20 0.6 1901-03-01
1901-04 11.6 18.2 1901-04-10 7.4 1901-04-23
1901-05 16.8 22.5 1901-05-31 12.2 1901-05-05
1901-06 21.0 24.8 1901-06-03 14.6 1901-06-17
1901-07 22.4 27.4 1901-07-30 16.9 1901-07-04
1901-08 20.7 25.9 1901-08-01 14.7 1901-08-29
You can call div and pass the sum of the columns to add % columns:
In [66]:
df['m_tax%'],df['m_tan%'] = df['m_tax'].div(df['m_tax'].sum()) * 100, df['m_tan'].div(df['m_tax'].sum()) * 100
df
Out[66]:
datum m_ta m_tax m_taxd m_tan m_tand m_tax% m_tan%
0 1901-01 -4.7 5.0 1901-01-23 -12.2 1901-01-10 3.551136 -8.664773
1 1901-02 -2.1 3.5 1901-02-06 -7.9 1901-02-15 2.485795 -5.610795
2 1901-03 5.8 13.5 1901-03-20 0.6 1901-03-01 9.588068 0.426136
3 1901-04 11.6 18.2 1901-04-10 7.4 1901-04-23 12.926136 5.255682
4 1901-05 16.8 22.5 1901-05-31 12.2 1901-05-05 15.980114 8.664773
5 1901-06 21.0 24.8 1901-06-03 14.6 1901-06-17 17.613636 10.369318
6 1901-07 22.4 27.4 1901-07-30 16.9 1901-07-04 19.460227 12.002841
7 1901-08 20.7 25.9 1901-08-01 14.7 1901-08-29 18.394886 10.440341