There are four variables in my dataset. Company shows the company's name. Return is the return of Company at day Date. Weight is the weight of this company in the market.
I want to keep all variables in the original file, and create an additional variable which is the market return (exclude Company itself). Market return corresponding for stock 'a' is the sum of all weighted stocks' return at the same Date in the market exclude stock a. For example, if there are 3 stocks in the market a, b and c. Market Return for stock a is Return(b)* [Weight(b)/(weight(b)+weight(C))] + Return(C)* [weight(C)/(weight(b)+weight(C)]. Similarly, Market Return for stock b is Return(a)* [Weight(a)/(weight(a)+weight(C))] + Return(C)* [weight(C)/(weight(a)+weight(C)].
I try to use proc summary but this function cannot exclude stock a when calculate the market return for stock a.
PROC SUMMARY NWAY DATA ;
CLASS Date ;
VAR Return / WEIGHT = weight;
OUTPUT
OUT = output
MEAN (Return) = MarketReturn;
RUN;
Could anyone teach me how to solve this please. I am relatively new to this software, so I dont know if I should use loop or there might be some better alternative.
This can be done with a bit of fancy algebra. It's not something that's built-in, though.
Basically:
Construct a "total" market return
Construct a stock by stock return (so just return of A)
Subtract out the portion that A contributes to total.
Thanks to the simple math that generates these lists, it's quite easy to do this.
Total sum = ((mean of A*Awgt) + (mean of remainder*sum of their weights))/(sum of Awgt + sum of rest wgts)
So, solve that for (mean of rest*mean of rest wgts / sum of rest wgts).
Exclusive sum: ((mean of all * sum of all wgts) - (mean of A * sum of A wgts)) / (sum of all wgts - sum of A wgts)
Something like this.
data returns;
input stock $ return weight;
datalines;
A .50 1
B .75 2
C .33 1
;;;;
run;
proc means data=returns;
class stock;
types () stock; *this is the default;
weight weight;
output out=means_out mean= sumwgt= /autoname;
run;
data returns_excl;
if _n_=1 then set means_out(where=(_type_=0) rename=(return_mean=tot_return return_sumwgt=tot_wgts));
set means_out(where=(_type_=1));
return_excl = (tot_return*tot_wgts-return_mean*return_sumwgt)/(tot_wgts-return_sumwgt);
run;
Related
I'm relatively new around the world of Power BI. I've got two different types of diesel, each of them with different prices.
I've also got calculated Moving Averages of both, and I need to see the average GAP between them but under the condition they need to have a value in the same DAY to calculate such average, otherwise it wouldn't be valid. The tables and expected result is kind of as follows:
TABLE DIESEL TYPE A
Date
Price DIESEL TYPE A
01-feb
1,2
05-may
1,3
06-ago
1,09
06-ago
1,1
07-sep
1,5
TABLE DIESEL TYPE B
Date
Price DIESEL TYPE B
01-feb
0,9
05-may
1,05
06-ago
0,8
06-ago
0,75
12-nov
0,7
Date
Average A
Average B
01-feb
1,2
0,9
05-may
1,3
1,05
06-ago
1,095
0,775
07-sep
1,5
-
12-nov
-
0,7
The expected GAP should be:
Date
GAP Average
01-feb
0,30
05-may
0,25
06-ago
0,32
07-sep
-
12-nov
-
In September 7th and November 12th I DONT want to have these averages calculated or shown on my graph, i.e. on my measure.
Getting an average of the difference between these two prices by date and under the condition there should be values for the same date in both type of diesels, otherwise I don't want to calculate such average, if for instance, there's a value 07-sep for Type A but no for Type B, and viceversa.
Use this measure:
GAP Average =
VAR avgA =
AVERAGE('DIESEL TYPE A'[Price DIESEL TYPE A])
VAR avgB =
AVERAGE('DIESEL TYPE B'[Price DIESEL TYPE B])
RETURN
IF(
OR(ISBLANK(avgA), ISBLANK(avgB)),
BLANK(),
avgA - avgB
)
I'm trying to create correlation matrix that also includes means and sd's of each variable.
** Set variables used in Summary and Correlation
local variables relationship commission anxiety enjoyment negotiation_efficacy similarity_values similarity_behaviors SPT_confidence own_SPT_effort
** Descriptive statistics
estpost summarize `variables'
matrix table = ( e(mean) \ e(sd) )
matrix rownames table = mean sd
matrix list table
** Correlation matrix
correlate `variables'
matrix C = r(C)
local k = colsof(C)
matrix C = C[1..`=`k'-1',.]
local corr : rownames C
matrix table = ( table \ C )
matrix list table
estadd matrix table = table
local cells table[count](fmt(0) label(Count)) table[mean](fmt(2) label(Mean)) table[sd](fmt(2) label(Standard Deviation))
local drop
foreach row of local corr {
local drop `drop' `row'
local cells `cells' table[`row'](fmt(4) drop(`drop'))
}
display "`cells'"
esttab using Report.rtf,
replace
noobs
nonumbers
compress
cells("`cells'")
If it helps, this is what the correlation code looks like:
asdoc corr relationship commission anxiety enjoyment negotiation_efficacy similarity_values similarity_behaviors SPT_confidence own_SPT_effort ranger_SPT_effort cooperative_motivation competitive_motivation, nonum
This correlation matrix looks exactly how it should, but I'm essentially hoping to add means and sd's to the beginning.
*This is cross-posted here: https://www.statalist.org/forums/forum/general-stata-discussion/general/1549809-add-mean-and-sd-column-in-correlation-matrix-in-stata
It's not clear for me whether you want the table to include significance stars or not. If not you can just use corr and a loop to obtain sd and mean, then use frmttable. Seems shorter than your current example. Here's an example
bcuse wage2
global variables "wage hours educ exper"
corr $variables
matrix corr_t = r(C)
local rows = rowsof(corr_t)
di "`rows'"
matrix add = J(`rows',2,.)
matrix list add
local n = 1
foreach x of global variables {
sum `x'
mat add[`n',1] = r(sd)
mat add[`n',2] = r(mean)
local n = `n' + 1
}
matrix final = corr_t,add
matrix list final
frmttable, statmat(final) sdec(2) ctitle("","wage", "hours", "educ", "exper","sd","mean") rtitle("wage"\ "hours"\ "educ" \ "exper")
I have some data in the following format:
COMPNAME DATA CAP RETURN
I have found some code that will construct and calculate the value-weighted return based on the data.
This works great and is below:
PROC SUMMARY NWAY DATA = Data1 ; CLASS DATE ;
VAR RETURN / WEIGHT = CAP ;
OUTPUT
OUT = MKTRET
MEAN (RETURN) = MONTHLYRETURN
RUN;
The extension that I would like to make is in my head a little bit complicated.
I want to make the weights based on the market capitalization in June.
So this will be a buy and hold portfolios. The actual data has 100's of companies but to give a representative example for two companies with the sole explanation of how the weights will evolve...
Say for example I have two companies, A and B.
The CAP of A is £100m and B is £100m.
In July of one year, I would invest 50% in A and 50% in B.
The returns in July are 10% and -10%.
Therefore I would invest 55% and 45%.
It will go on like this until next June when I will re-balance again based on the market capitalisation...
10% monthly return is pretty speculative!
When the two companies differ by more than 200 you will need to also sell and buy to equalize the companies.
Presume the rates per month are simulated and stored in a data set. You can generate a simulated ledger as follows
add returns
compare balances
equalize by splitting 200 investment if balances are close enough
equalize by investing all 200 in one and selling and buying
Of course, a portfolio with more than 2 companies becomes a more complicated balancing act to achieve mathematical balance.
data simurate(label="Future expectation is not an indicator of past performance :)");
do month = 1 to 60;
do company = 1 to 2;
return = round (sin(company+month/4) / 12, 0.001); %* random return rate for month;
output;
end;
end;
run;
data want;
if 0 then set simurate;
declare hash lookup (dataset:'simurate');
lookup.defineKey ('company', 'month');
lookup.defineData('return');
lookup.defineDone();
month = 0;
bal1 = 0; bal2 = 0;
output;
do month = 1 to 60;
lookup.find(key:1, key:month); rate1 = return;
ret1 = round(bal1 * rate1, 0.0001);
lookup.find(key:2, key:month); rate2 = return;
ret2 = round(bal1 * rate2, 0.0001);
bal1 + ret1;
bal2 + ret2;
goal = mean(bal1,bal2) + 100;
sel1 = 0; buy1 = 0;
sel2 = 0; buy2 = 0;
if abs(bal1-bal2) <= 200 then do;
* difference between balances after returns is < 200;
* balances can be equalized simple investment split;
inv1 = goal - bal1;
inv2 = goal - bal2;
end;
else if bal1 < bal2 then do;
* sell bal2 as needed to equalize;
inv1 = 200;
inv2 = 0;
buy1 = goal - 200 - bal1;
sel2 = bal2 - goal;
end;
else do;
inv2 = 200;
inv1 = 0;
buy2 = goal - 200 - bal2;
sel1 = bal1 - goal;
end;
bal1 + (buy1 - sel1 + inv1);
bal2 + (buy2 - sel2 + inv2);
output;
end;
stop;
drop company return ;
format bal: 10.4 rate: 5.3;
run;
I am trying to create a variable for each year in my data based on mathematical expressions of other variables (I have annual data and used "..." to avoid writing each year). I am using the summarize command in Stata to extract the standard deviation but Stata does not recognize the frac variable. I have tried to use egen but that results in an unknown function error. Using gen results in an already defined variable. I would appreciate anyone helping with the following code or pointing me to a link where this issue has been discussed.
foreach yr of numlist 1995...2012 {
local row = `yr' - 1994
local numerator = 100*(income - L1.income)
local denominator = ((abs(income) + abs(L1.income)) / 2)
local frac = (`numerator' / `denominator')
summarize frac
local sdfrac = r(sd)
matrix C[`row', 1] = `numerator'
matrix C[`row', 2] = `denominator'
matrix C[`row', 3] = `sdfrac'
}
If I am understanding your question right, maybe you don't need to use a loop until the end and then you can post the results to a postfile:
This is just a thought:
tempname memhold
tempfile filename
postfile `memhold' year sdfrac using `filename'
gen row=year-1994
gen numerator=100*(income-L1.income)
gen denominator=((abs(income)+abs(L1.income))/2)
gen frac=numerator/denominator
foreach yr of numlist 1995...2012 {
summarize frac if year=`yr'
local sdfrac=r(sd)
post `memhold' (year) (`sdfrac')
}
postclose `memhold'
clear all
use `filename'
*View Results
list
This code should get you a data set with the name of the year and the standard deviation of the frac variable as variables.
In a comment, OP added a question about code similar to this (but ignored the request to post it in a more civilised form). Note that backticks or left quotation marks in Stata clash with SO mark-up codes in comments. Presumably some
tempname memhold
definition preceded this.
postfile `memhold' year sdfrac sex race using myresults
levels of sex, local (s)
levelsof race, local (r)
foreach a of local s {
foreach b of local r {
forval yr = 1995/2012 {
summarize frac if year == `yr' & sex == `a' & race == `b'
post `memhold' (`yr') (`r(sd)') (`sex') (`race')
}
}
}
Let's focus on what the problem is. You want the standard deviations of frac for all combinations of sex, race and year in a separate file. That's one line
collapse (sd) frac, by(year sex race)
If you want to see a table alongside the data, consider
egen group = group(sex race year), label
and then
tab group, su(frac)
or
tabstat frac, by(group) stat(sd)
This code modifies that by #Pcarlitz, mostly by simplifying it. I can't check with your data, which I don't have.
It's too long to fit into a comment.
I would not use a temporary file as you want to save these results, it seems.
tempname memhold
postfile `memhold' year sdfrac using myresults
gen frac = (100*(income - L1.income))/((abs(income) + abs(L1.income))/2)
forval yr = 1995/2012 {
summarize frac if year==`yr'
post `memhold' (`yr') (`r(sd)')
}
postclose `memhold'
use myresults
list
UPDATE As in a later answer, consider collapse as a much simpler direct alternative here.
Is the modified version of kappa proposed by Conger (1980) available in Stata? Tried to google it to no avail.
This is an old question, but in case anyone is still looking--the SSC package kappaetc now calculates that, along with every other inter-rater statistic you could ever want.
Since no one has responded with a Stata solution, I developed some code to calculate Conger's kappa using the formulas provided in Gwet, K. L. (2012). Handbook of Inter-Rater Reliability (3rd ed.), Gaithersburg, MD: Advanced Analytics, LLC. See especially pp. 34-35.
My code is undoubtedly not as efficient as others could write, and I would welcome any improvements to the code or to the program format that others wish to make.
cap prog drop congerkappa
prog def congerkappa
* This program has only been tested with Stata 11.2, 12.1, and 13.0.
preserve
* Number of judges
scalar judgesnum = _N
* Subject IDs
quietly ds
local vlist `r(varlist)'
local removeit = word("`vlist'",1)
local targets: list vlist - removeit
* Sums of ratings by each judge
egen judgesum = rowtotal(`targets')
* Sum of each target's ratings
foreach i in `targets' {
quietly summarize `i', meanonly
scalar mean`i' = r(mean)
}
* % each target rating of all target ratings
foreach i in `targets' {
gen `i'2 = `i'/judgesum
}
* Variance of each target's % ratings
foreach i in `targets' {
quietly summarize `i'2
scalar s2`i'2 = r(Var)
}
* Mean variance of each target's % ratings
foreach i in `targets' {
quietly summarize `i'2, meanonly
scalar mean`i'2 = r(mean)
}
* Square of mean of each target's % ratings
foreach i in `targets' {
scalar mean`i'2sq = mean`i'2^2
}
* Sum of variances of each target's % ratings
scalar sumvar = 0
foreach i in `targets' {
scalar sumvar = sumvar + s2`i'2
}
* Sum of means of each target's % ratings
scalar summeans = 0
foreach i in `targets' {
scalar summeans = summeans + mean`i'2
}
* Sum of meansquares of each target's % ratings
scalar summeansqs = 0
foreach i in `targets' {
scalar summeansqs = summeansqs + mean`i'2sq
}
* Conger's kappa
scalar conkappa = summeansqs -(sumvar/judgesnum)
di _n "Conger's kappa = " conkappa
restore
end
The data structure required by the program is shown below. The variable names are not fixed, but the judge/rater variable must be in the first position in the data set. The data set should not include any variables other than the judge/rater and targets/ratings.
Judge S1 S2 S3 S4 S5 S6
Rater1 2 4 2 1 1 4
Rater2 2 3 2 2 2 3
Rater3 2 5 3 3 3 5
Rater4 3 3 2 3 2 3
If you would like to run this against a test data set, you can use the judges data set from StataCorp and reshape it as shown.
use http://www.stata-press.com/data/r12/judges.dta, clear
sort judge
list, sepby(judge)
reshape wide rating, i(judge) j(target)
rename rating* S*
list, noobs
* Run congerkappa program on demo data set in memory
congerkappa
I have run only a single validation test of this code against the data in Table 2.16 in Gwet (p. 35) and have replicated the Conger's kappa = .23343 as calculated by Gwet on p. 34. Please test this code on other data with known Conger's kappas before relying on it.
I don't know if Conger's kappa for multiple raters is available in Stata, but it is available in R via the irr package, using the kappam.fleiss function and specifying the exact option. For information on the irr package in R, see http://cran.r-project.org/web/packages/irr/irr.pdf#page.12 .
After installing and loading the irr package in R, you can view a demo data set and Conger's kappa calculation using the following code.
data(diagnoses)
print(diagnoses)
kappam.fleiss(diagnoses, exact=TRUE)
I hope someone else here can help with a Stata solution, as you requested, but this may at least provide a solution if you can't find it in Stata.
In response to Dimitriy's comment below, I believe Stata's native kappa command applies either to two unique raters or to more than two non-unique raters.
The original poster may also want to consider the icc command in Stata, which allows for multiple unique raters.