Convert Daily Returns to Monthly Returns in Stata - stata

I am using Stata and I have 6 years of daily returns for stocks that individuals hold in their portfolios. I would like to aggregate the daily returns to monthly portfolio returns. In some instances, the individual may hold more than one stock in the portfolio. I am struggling with writing the code to do this.
For a visual, my data looks like this:
I would like the results to look like this:
Where individual 2's portfolio return for the month of December 1996 is calculated as: 0.3 * 0.0031 + 0.7 * 0.0076 = 0.00625.
I have tried the collapse command such as
collapse Return, by (ID Year Month)
but this does not provide the same return that I calculated out in Excel.
I am able to make a weighted portfolio return for all the days using
bysort ID year month: egen wt_return = stock_weight * monthly_return
But this gives me daily returns. My trouble is then aggregating them into one return for the corresponding month.
As for the specifics, I would like to calculate the monthly portfolio return as the product of 1 + the weighted daily returns. As a last resort, the mean return for the month could work.

You don't show monthly portfolio return for person 2 in 1991. Your initial example data doesn't show stock weights but the desired example
data does. Your variable Monthly Return is not reproducible. You should take time to verify your question is clear when posting.
It's supposed be clear to the public who will read it, not only to you.
I didn't bother checking if your computations are correct but below is what I
understand you want. The procedure is simply to compute a weighted return and then
add them up by person year month groups. (I assume the stock weights apply to stocks on a daily basis, which is what your example data implies.)
clear all
set more off
input ///
perid year month day str3 stockid return stockw
1 1991 1 1 "ABC" .01 1
1 1991 1 2 "ABC" .02 1
1 1991 1 3 "ABC" -.01 1
1 1991 1 31 "ABC" .004 1
1 1996 12 31 "ABC" .002 1
2 1991 1 1 "ABC" .01 .3
2 1991 1 2 "ABC" .02 .3
2 1996 12 31 "ABC" .004 .3
2 1991 1 1 "XYZ" .001 .7
2 1991 1 2 "XYZ" .004 .7
2 1996 12 31 "XYZ" .021 .7
end
* create weighted return
gen returnw = return * stockw
sort perid year month day
list, sepby(perid year month day)
* sum weighted returns by person, year, month
collapse (sum) returnw, by (perid year month)
list, sepby(perid)
If you want collapse to sum, then you must indicate it with the (sum) (although I'm not clear if this is what you want). By default, it computes the mean. Read help collapse thouroughly.

Related

DAX equation to average data with different timespans

I have data for different companies. The data stops at day 10 for one of the companies (Company 1), day 6 for the others. If Company 1 is selected with other companies, I want to show the average so that the data runs until day 10, but using day 7, 8, 9, 10 values for Company 1 and day 6 values for others.
I'd want to just fill down days 8-10 for other companies with the day 6 value, but that would look misleading on the graph. So I need a DAX equation with some magic in it.
As an example, I have companies:
Company 1
Company 2
Company 3
etc. as a filter
And a table like:
Company
Date
Day of Month
Count
Company 1
1.11.2022
1
10
Company 1
2.11.2022
2
20
Company 1
3.11.2022
3
21
Company 1
4.11.2022
4
30
Company 1
5.11.2022
5
40
Company 1
6.11.2022
6
50
Company 1
7.11.2022
7
55
Company 1
8.11.2022
8
60
Company 1
9.11.2022
9
62
Company 1
10.11.2022
10
70
Company 1
11.11.2022
11
NULL
Company 2
1.11.2022
1
15
Company 2
2.11.2022
2
25
Company 2
3.11.2022
3
30
Company 2
4.11.2022
4
34
Company 2
5.11.2022
5
45
Company 2
6.11.2022
6
100
Company 2
7.11.2022
7
NULL
Every date has a row, but for days over 6/10 the count is NULL. If Company 1 or Company 2 is chosen separately, I'd like to show the count as is. If they are chosen together, I'd like the average of the two so that:
Day 5: AVG(40,45)
Day 6: AVG(50,100)
Day 7: AVG(55,100)
Day 8: AVG(60,100)
Day 9: AVG(62,100)
Day 10: AVG(70,100)
Any ideas?
You want something like this?
Create a Matriz using your:
company_table_dim (M)
calendar_Days_Table(N)
So you will have a new table of MXN Rows
Go to PowerQuery Order DATA and FillDown your QTY column
(= Table.FillDown(#"Se expandió Fact_Table",{"QTY"}))
So your last known QTY will de filled til the end of Time_Table for any company filters
Cons: Consider your new Matriz MXN it could be millions of rows to calculate
Greetings
enter image description here

Average of percent of column totals in DAX

I have a fact table named meetings containing the following:
- staff
- minutes
- type
I then created a summarized table with the following:
TableA =
SUMMARIZECOLUMNS (
'meetings'[staff]
, 'meetings'[type]
, "SumMinutesByStaffAndType", SUM( 'meetings'[minutes] )
)
This makes a pivot table with staff as rows and columns as types.
For this pivottable I need to calculate each cell as a percent of the column total. For each staff I need the average of their percents. There are only 5 meeting types so I need the sum of these percents divided by 5.
I don't know how to divide one number grouped by two columns by another number grouped by one column. I'm coming from the SQL world so my DAX is terrible and I'm desperate for advice.
I tried creating another summarized table to get the sum of minutes for each type.
TableB =
SUMMARIZECOLUMNS (
'meetings'[type]
, "SumMinutesByType", SUM( 'meetings'[minutes] )
)
From there I want 'TableA'[SumMinutesByStaffAndType] / 'TableB'[SumMinutesByType].
TableC =
SUMMARIZECOLUMNS (
'TableA'[staff],
'TableB'[type],
DIVIDE ( 'TableA'[SumMinutesByType], 'TableB'[SumMinutesByType]
)
"A single value for column 'Minutes' in table 'Min by Staff-Contact' cannot be determined. This can happen when a measure formula refers to a column that contains many values without specifying an aggregation such as min, max, count, or sum to get a single result."
I keep arriving at this error which leads me to believe I'm not going about this the "Power BI way".
I have tried making measures and creating matrices on the reports view. I've tried using the group by feature in the Query Editor. I even tried both measures and aggregate tables. I'm likely overcomplicating it and way off the mark so any help is greatly appreciated.
Here's an example of what I'm trying to do.
## Input/First table
staff minutes type
--------- --------- -----------
Bill 5 TELEPHONE
Bill 10 FACE2FACE
Bill 5 INDIRECT
Bill 5 EMAIL
Bill 10 OTHER
Gary 10 TELEPHONE
Gary 5 EMAIL
Gary 5 OTHER
Madison 20 FACE2FACE
Madison 5 INDIRECT
Madison 15 EMAIL
Rob 5 FACE2FACE
Rob 5 INDIRECT
Rob 20 TELEPHONE
Rob 45 FACE2FACE
## Second table with SUM of minutes, Grand Total is column total.
Row Labels EMAIL FACE2FACE INDIRECT OTHER TELEPHONE
------------- ------- ----------- ---------- ------- -----------
Bill 5 10 5 10 5
Gary 5 5 10
Madison 15 20 5
Rob 50 5 20
Grand Total 25 80 15 15 35
## Third table where each of the above cells is divided by its column total.
Row Labels EMAIL FACE2FACE INDIRECT OTHER TELEPHONE
------------- ------- ----------- ------------- ------------- -------------
Bill 0.2 0.125 0.333333333 0.666666667 0.142857143
Gary 0.2 0 0 0.333333333 0.285714286
Madison 0.6 0.25 0.333333333 0 0
Rob 0 0.625 0.333333333 0 0.571428571
Grand Total 25 80 15 15 35
## Final table with the sum of the rows in the third table divided by 5.
staff AVERAGE
--------- -------------
Bill 29.35714286
Gary 16.38095238
Madison 23.66666667
Rob 30.5952381
Please let me know if I can clarify an aspect.
You can make use of the built in functions like %Row total in Power BI, Please find the snapshot below
If this is not what you are looking for, kindly let me know (I have used your Input table)

Adding column based on ID in another data

data1 is data from 1990 and it looks like
Panelkey Region income
1 9 30
2 1 20
4 2 40
data2 is data from 2000 and it looks like
Panelkey Region income
3 2 40
2 1 30
1 1 20
I want to add a column of where each person lived in 1990.
Panelkey Region income Region1990
3 2 40 .
2 1 30 1
1 1 20 9
How can I do this on Stata?
The following code will deal with panels that live in multiple regions in the same year by choosing the region with larger income. This would make sense if income was proportional to fraction of the year spent in a region. Same income ties will be broken arbitrarily using the highest region's value. Other types of aggregation might make sense (take a look at the -collapse- command).
Note that I tweaked your data by inserting second rows for the last observation in each year:
clear
input Panelkey Region income
1 9 30
2 1 20
4 2 40
4 10 80
end
rename (Region income) =1990
bysort Panelkey (income Region): keep if _n==_N
isid Panelkey
save "data1990.dta", replace
clear
input Panelkey Region income
3 2 40
2 1 30
1 1 20
1 9 20
end
bysort Panelkey (income Region): keep if _n==_N
isid Panelkey
merge 1:1 Panelkey using "data1990.dta", keep(match master) nogen
list, clean noobs

Adding across years

Quick question. I'm working with code that produces a spreadsheet that contains the information like the following:
year business sales profit
2001 a 5 3
2002 a 6 4
2003 a 4 2
2001 b 2 1
2002 b 6 3
2003 b 7 5
How can I get Stata to total sales and profits across years?
Thanks
Try
collapse (sum) sales profit, by(year)
or, if you want to retain your original data,
bysort year: egen tot_sales = total(sales)
egen stands for extended generate, a very useful command.

How to write the best code for data aggregation?

I have the following dataset (individual level data):
pid year state income
1 2000 il 100
2 2000 ms 200
3 2000 al 30
4 2000 dc 400
5 2000 ri 205
1 2001 il 120
2 2001 ms 230
3 2001 al 50
4 2001 dc 400
5 2001 ri 235
.........etc.......
I need to estimate average income for each state in each year and create a new dataset that would look like this:
state year average_income
ar 2000 150
ar 2001 200
ar 2002 250
il 2000 150
il 2001 160
il 2002 160
...........etc...............
I already have a code that runs perfectly fine (I have two loops). However, I would like to know is there any better way in Stata like sql style query?
This is shorter code than any suggested so far:
collapse average_income=income, by(state year)
This shouldn't need 2 loops, or any for that matter. There are in fact more efficient ways to do this. When you are repeating an operation on many groups, the bysort command is useful:
bysort year state: egen average_income = mean(income)
You also don't have to create a new dataset, you can just prune this one and save it. Start by only keeping the variables you want (state, year and average_income) and get rid of duplicates:
keep state year average_income
duplicates drop
save "mynewdataset.dta"
You have the SQL tag on the question. This is a basic aggregation query in SQL:
select state, year, avg(income) as average_income
from t
group by state, year;
To put this in a table, depends on your database. One of the following typically works:
create table NewTable as
select state, year, avg(income) as average_income
from t
group by state, year;
Or:
select state, year, avg(income) as average_income
into NewTable
from t
group by state, year;