Converting daily data in to weekly in Pandas - python-2.7

I have a dataframe as given below:
Index Date Country Occurence
0 2013-12-30 US 1
1 2013-12-30 India 3
2 2014-01-10 US 1
3 2014-01-15 India 1
4 2014-02-05 UK 5
I want to convert daily data into weekly,grouped by anatomy,method being sum.
Itried resampling,but the output gave Multi Index data frame from which i was not able to access "Country" and "Date" columns(pls refer above)
The desired output is given below:
Date Country Occurence
Week1 India 4
Week2
Week1 US 2
Week2
Week5 Germany 5

You can groupby on country and resample on week
In [63]: df
Out[63]:
Date Country Occurence
0 2013-12-30 US 1
1 2013-12-30 India 3
2 2014-01-10 US 1
3 2014-01-15 India 1
4 2014-02-05 UK 5
In [64]: df.set_index('Date').groupby('Country').resample('W', how='sum')
Out[64]:
Occurence
Country Date
India 2014-01-05 3
2014-01-12 NaN
2014-01-19 1
UK 2014-02-09 5
US 2014-01-05 1
2014-01-12 1
And, you could use reset_index()
In [65]: df.set_index('Date').groupby('Country').resample('W', how='sum').reset_index()
Out[65]:
Country Date Occurence
0 India 2014-01-05 3
1 India 2014-01-12 NaN
2 India 2014-01-19 1
3 UK 2014-02-09 5
4 US 2014-01-05 1
5 US 2014-01-12 1

Related

Google Sheets formula for summing/averaging with specific conditions

I am hoping for a formula to take hours from the name columns and sum/average them by week, into a separate table like the 2nd one below. The formulas need to update upon changing the start and end week cells.
Body Part
Start Week
End Week
Arnold (hours)
Usain (hours)
Bob (hours)
Arms
1
3
6
3
0
Legs
1
6
12
36
20
Chest
2
4
6
2
2
Booty
4
6
9
12
3
Core
1
5
10
5
5
Formula Needed:
Hours
Arnold
Usian
Bob
Week 1
6
8
4.33
Week 2
8
8.67
5
Week 3
8
8.67
5
Week 4
9
11.67
6
Week 5
7
11
5.33
Week 6
5
10
4.33
Bonus if there is a way to also quickly average hours by body parts if for example there are multiple Arms rows.
try:
=ARRAYFORMULA(LAMBDA(a, b, QUERY(SPLIT(FLATTEN(BYCOL(D1:F1, LAMBDA(xx, FLATTEN(IF(
IF(a>=SEQUENCE(1, MAX(a)), "Week "&TEXT(SEQUENCE(1, MAX(a))+b, "00"), )="",,
REGEXEXTRACT(OFFSET(xx,,,1), "(.+) \(")&"×"&
IF(a>=SEQUENCE(1, MAX(a)), "Week "&TEXT(SEQUENCE(1, MAX(a))+b, "00"), )&"×"&
QUERY({REGEXEXTRACT(OFFSET(xx,,,1), "(.+) \("); OFFSET(xx,1,,9^9)/(a)}, "offset 1", )))))), "×"),
"select Col2,sum(Col3) where Col3>0 group by Col2 pivot Col1"))
(C2:INDEX(C:C, MAX(ROW(C:C)*(C:C<>"")))-B2:INDEX(B:B, MAX(ROW(B:B)*(B:B<>"")))+1,
B2:INDEX(B:B, MAX(ROW(B:B)*(B:B<>"")))-1))

cumulative average powerbi by month

I have below dataset.
Math Literature Biology date student
4 2 5 2019-08-25 A
4 5 4 2019-08-08 A
5 4 5 2019-08-23 A
5 5 5 2019-08-15 A
5 5 5 2019-07-19 A
5 5 5 2019-07-15 A
5 5 5 2019-07-03 A
5 5 5 2019-06-26 A
1 1 2 2019-06-18 A
2 3 3 2019-06-14 A
5 5 5 2019-05-01 A
2 1 3 2019-04-26 A
I need to develop a solution in powerbi so in output I have cumulative average per subject per month
For example
April May June July August
Math | 2 3.5 3 3.75 4
Literature | 1 3 3 3.75 3.83
Biology | 3 4 3.6 4.125 4.33
Can you help?
You can use a matrix visualization for this.
Create a month-year variable and use it in the columns.
Use Average of Math,Literature and Biology in values
Under the format pane --> Values --> Show on rows --> Select this
This should give the view you are looking for. You can edit the value headers to your requirement.

Sum 5 rows at a time in an ordered SAS table with no unique identifier using proc sql

I'm working with a SAS table where I have ordered data that I need to sum in intervals of 5. I don't have a unique ID I can use for the group by statement and I'm struggling to find a solution.
Say I have this table
Number Name X Y
1 Susan 2 1
2 Susan 3 3
3 Susan 3 3
4 Susan 4 1
5 Susan 1 2
6 Susan 1 1
7 Susan 1 1
8 Susan 2 4
9 Susan 1 5
10 Susan 4 2
1 Steve 2 4
2 Steve 2 3
3 Steve 1 2
4 Steve 3 5
5 Steve 1 1
6 Steve 1 3
7 Steve 2 3
8 Steve 2 4
9 Steve 1 1
10 Steve 1 1
I'd want the output to look like
Number Name X Y
1-5 Susan 13 10
6-10 Susan 9 13
1-5 Steve 9 15
6-10 Steve 7 12
Is there an easy way to get output like this using proc sql? Thanks!
Try this:
proc sql;
select ceil(Number/5) as Grouping, Name, sum(X), sum(Y)
from have
group by Name, Grouping;
quit;

Date periods based on first occurence

I have a pandas data frame of orders:
OrderID OrderDate Value CustomerID
1 2017-11-01 12.56 23
2 2017-11-06 1.56 23
3 2017-11-08 2.67 23
4 2017-11-12 5.67 99
5 2017-11-13 7.88 23
6 2017-11-19 3.78 99
Let's look at customer with ID 23.
His first order in the history was 2017-11-01. This date is a start date for his first week. It means that all his orders between 2017-11-01 and 2017-11-07 are assigned to his week number 1 (It IS NOT a calendar week like Monday to Sunday).
For customer with ID 99 first week starts 2017-11-12 of course as it is a date of his first order (OrderId 6).
I need to assign every order of the table to the respective index of the common table Periods. Periods[0] will contain orders from customer's weeks number 1, Periods[1] from customer's weeks number 2 etc.
OrderId 1 nad OrderId 6 will be in the same index of Periods table as both orders were created in first week of their customers.
Period table containig orders IDs has to look like this:
Periods=[[1,2,4],[3,5,6]]
Is this what you want ?
df['New']=df.groupby('CustomerID').OrderDate.apply(lambda x : (x-x.iloc[0]).dt.days//7)
df.groupby('New').OrderID.apply(list)
Out[1079]:
New
0 [1, 2, 4]
1 [3, 5, 6]
Name: OrderID, dtype: object
To get your period table
df.groupby('New').OrderID.apply(list).tolist()
Out[1080]: [[1, 2, 4], [3, 5, 6]]
More info
df
Out[1081]:
OrderID OrderDate Value CustomerID New
0 1 2017-11-01 12.56 23 0
1 2 2017-11-06 1.56 23 0
2 3 2017-11-08 2.67 23 1
3 4 2017-11-12 5.67 99 0
4 5 2017-11-13 7.88 23 1
5 6 2017-11-19 3.78 99 1

Dataset transpose ideas using AppEngine python without pandas

I need to convert this set of data [bigquery response]:
country metric quarter
Argentina 34174 1
Argentina 83961 2
Argentina 96373 3
Argentina 103782 4
Chile 7636 1
Chile 23434 2
Chile 19103 3
Chile 21729 4
to this:
Quarter Argentina Chile
1 83961 19103
2 96373 21729
3 103782 23434
4 34174 7636
I use AppEngine[python]. My idea is use numpy 1.6.1. but I'm open to receive ideas...
Edit query used:
SELECT country, activity, sum(metric), quarter, year
From *table* where country IN (*countries-parameters*)...
group by 1,2,4,5
order by 1,4 ASC
1 Solution from bigquery query:
Select
quarter,
SUM(IF(country=*country*,metric,0)) AS *country*,
...
From *table*
Where quarter IN (1,2,3,4) and country in (*countries-parameters*)
group by 1
order by 1 ASC;