manipulating the format of date on X-axis - stata

I have a weekly dataset. I use this code to plot the causality between variables. Stata shows the number of weeks of each year on the X-axis. Is it possible to show only year or year-month instead of year-week on the X-axis?
generate Date =wofd(D)
format Date %tw
tsset Date
tvgc Momentum supply, p(3) d(3) trend window(25) prefix(_) graph

The fact that you have weekly data is only a distraction here.
You should only use Stata's weekly date functions if your weeks satisfy Stata's rules:
Week 1 starts on 1 January, always.
Later weeks start 7 days later in turn, except that week 52 is always 8 or 9 days long.
Hence there is no week 53.
These are documented rules, and they do not match your data. You are lucky that you have no 53 week years in your data; otherwise you would get some bizarre results.
See much detailed discussion at references turned up by search week, sj.
The good news is that you need just to build on what you have and put labels and ticks on your x axis. It's a little bit of work but no more than use of standard and documented label and tick options. The main ideas are blindingly obvious once spelled out:
Labels Put informative labels in the middle of time intervals. Suppress the associated ticks. You can suppress a tick by setting its length to zero or its colour to invisible.
Ticks Put ticks as the ends (equivalently beginnings) of time intervals. Lengthen ticks as needed.
Grid lines Lines demarcating years could be worth adding. None are shown here, but the syntax is just an extension of that given.
Axis titles If the time (usually x) axis is adequately explained, that axis title is redundant and even dopey if it is some arbitrary variable name.
See especially https://www.stata-journal.com/article.html?article=gr0030 and https://www.stata-journal.com/article.html?article=gr0079
With your data, showing years is sensible but showing months too is likely to produce crowded detail that is hard to read and not much use. I compromised on quarters.
* Example generated by -dataex-. For more info, type help dataex
clear
input str10 D float(Momentum Supply)
"12/2/2010" -1.235124 4.760894
"12/9/2010" -1.537671 3.002344
"12/16/2010" -.679893 1.5665628
"12/23/2010" 1.964229 .5875537
"12/30/2010" -1.1872853 -1.1315695
"1/6/2011" .028031677 .065580264
"1/13/2011" .4438451 1.2316793
"1/20/2011" -.3865465 1.7899017
"1/27/2011" -.4547117 1.539866
"2/3/2011" 1.6675532 1.352376
"2/10/2011" -.016190516 3.72986
"2/17/2011" .5471755 2.0804555
"2/24/2011" .2695233 2.1094923
"3/3/2011" .5136591 -1.0686383
"3/10/2011" .606721 3.786967
"3/17/2011" .004175631 .4544936
"3/24/2011" 1.198901 -.3316304
"3/31/2011" .1973385 .5846249
"4/7/2011" 2.2470737 1.0026894
"4/14/2011" .3980386 -2.6676855
"4/21/2011" -1.530687 -7.214682
"4/28/2011" -.9735931 3.246654
"5/5/2011" .13312873 .9581707
"5/12/2011" -.8017629 -.468076
"5/19/2011" -.11491735 -4.354526
"5/26/2011" .3627179 -2.233418
"6/2/2011" .13805833 2.2697728
"6/9/2011" .27832976 .58203816
"6/16/2011" -1.9467738 -.2834298
"6/23/2011" -.9579238 -1.0356172
"6/30/2011" 1.1799787 1.1011268
"7/7/2011" -2.0982232 .5292908
"7/14/2011" -.2992591 -.4004747
"7/21/2011" .5904395 -2.5159726
"7/28/2011" -.21626104 1.936029
"8/4/2011" -.02421602 -.8160484
"8/11/2011" 1.5797064 -.6868965
"8/18/2011" 1.495294 -1.8621664
"8/25/2011" -1.2188485 -.8388996
"9/1/2011" .4991612 -1.6689343
"9/8/2011" 2.1691883 1.3244398
"9/15/2011" -1.2074957 .9707839
"9/22/2011" -.3399567 .6742781
"9/29/2011" 1.9860272 -3.331345
"10/6/2011" 1.935733 -.3882593
"10/13/2011" -1.278119 .6796986
"10/20/2011" -1.3209987 .2258049
"10/27/2011" 4.315368 .7879103
"11/3/2011" .58669937 -.5040554
"11/10/2011" 1.460597 -2.0426705
"11/17/2011" -1.338189 -.24199644
"11/24/2011" -1.6870773 -1.1143018
"12/1/2011" -.19232976 -1.2156726
"12/8/2011" -2.655519 -2.054406
"12/15/2011" 1.7161795 -.15301673
"12/22/2011" -1.43026 -3.138013
"12/29/2011" .03427247 -.28446484
"1/5/2012" -.15930523 -3.362428
"1/12/2012" .4222094 4.0962815
"1/19/2012" -.2413332 3.8277814
"1/26/2012" -2.850591 .067359865
"2/2/2012" -1.1785052 -.3558361
"2/9/2012" -1.0380571 .05134211
"2/16/2012" .8539951 -4.421839
"2/23/2012" .2636529 1.3424703
"3/1/2012" .022639304 2.734022
"3/8/2012" .1370547 .8043283
"3/15/2012" .1787796 -.56465846
"3/22/2012" -2.0645525 -2.9066684
"3/29/2012" 1.562931 -.4505192
"4/5/2012" 1.2587242 -.6908772
"4/12/2012" -1.5202224 .7883849
"4/19/2012" 1.0128288 -1.6764873
"4/26/2012" -.29182148 1.920932
"5/3/2012" -1.228097 -3.7068026
"5/10/2012" -.3124508 -3.034149
"5/17/2012" .7570716 -2.3398724
"5/24/2012" -1.0697783 -2.438565
"5/31/2012" 1.2796624 1.299344
"6/7/2012" -1.5482885 -1.228557
"6/14/2012" 1.396692 3.2158935
"6/21/2012" .3116726 8.035475
"6/28/2012" -.22332123 .7450229
"7/5/2012" .4655248 .04986914
"7/12/2012" .4769497 4.045938
"7/19/2012" .08743203 .25987592
"7/26/2012" -.402533 .3213503
"8/2/2012" -.1564897 1.5290447
"8/9/2012" -.0919008 .13955575
"8/16/2012" -1.3851573 1.0860283
"8/23/2012" .020250637 -.8858514
"8/30/2012" -.29458764 -1.6602173
"9/6/2012" -.39921495 -.8043483
"9/13/2012" 1.76396 4.2867813
"9/20/2012" -1.2335806 2.476225
"9/27/2012" .176066 -.5992883
"10/4/2012" .1075483 1.7167135
"10/11/2012" .06365488 1.1636261
"10/18/2012" -.2305842 -1.506699
"10/25/2012" -.1526354 -2.669866
"11/1/2012" -.06311637 -2.0813057
"11/8/2012" .55959195 .8805096
"11/15/2012" 1.5306772 -2.708766
"11/22/2012" -.5585792 .26319882
"11/29/2012" -.035690214 -1.6176193
"12/6/2012" -.7885767 1.1719254
"12/13/2012" .9131169 -1.1135346
"12/20/2012" -.6910864 -.4893669
"12/27/2012" .9836168 .4052487
"1/3/2013" -.8828759 .7161615
"1/10/2013" 1.505474 -.1768004
"1/17/2013" -1.3013282 -1.333739
"1/24/2013" -1.3670077 1.0568022
"1/31/2013" .05846912 -.7845241
"2/7/2013" .4923012 -1.202816
"2/14/2013" -.06551787 -.9198701
"2/21/2013" -1.8149366 -.1746187
"2/28/2013" .3370621 1.0104061
"3/7/2013" 1.2698976 1.273357
"3/14/2013" -.3884514 .7927139
"3/21/2013" -.1437847 1.7798674
"3/28/2013" -.2325031 .9336611
"4/4/2013" .03971701 .6680117
"4/11/2013" -.25990707 -3.0261614
"4/18/2013" .7046488 -.458615
"4/25/2013" -2.1198323 -.14664523
"5/2/2013" 1.591287 -.3687443
"5/9/2013" -1.1266721 -2.0973356
"5/16/2013" -.7595757 -1.1238302
"5/23/2013" 2.2590933 2.124479
"5/30/2013" -.7447268 .7387985
"6/6/2013" 1.3409324 -1.3744274
"6/13/2013" -.3844476 -.8341842
"6/20/2013" -.8135379 -1.7971268
"6/27/2013" -2.506065 -.4194731
"7/4/2013" -.4755843 -5.216218
"7/11/2013" -1.256806 1.8539237
"7/18/2013" -.13328764 -1.0578626
"7/25/2013" 1.2412375 1.7703875
"8/1/2013" 1.5033063 -2.2505422
"8/8/2013" -1.291876 -1.5896243
"8/15/2013" 1.0093634 -2.8861396
"8/22/2013" -.6952878 -.23103845
"8/29/2013" -.05459245 1.53916
"9/5/2013" 1.2413216 .749662
"9/12/2013" .19232245 2.81967
"9/19/2013" -2.6861706 -4.520664
"9/26/2013" .3105677 -5.274343
"10/3/2013" -.2184027 -3.251637
"10/10/2013" -1.233326 -5.031735
"10/17/2013" 1.9415965 -1.250861
"10/24/2013" -1.2008202 -1.5703772
"10/31/2013" -.6394427 -1.1347327
"11/7/2013" 2.715824 2.0324607
"11/14/2013" -1.5833142 2.5080755
"11/21/2013" .9940037 4.117931
"11/28/2013" -.8226601 3.752914
"12/5/2013" .09966203 1.865995
"12/12/2013" -.18744355 2.5426314
end
gen ddate = daily(D, "MDY")
gen year = year(ddate)
gen dow = dow(ddate)
tab year
tab dow
forval y = 2010/2013 {
local Y = `y' + 1
local yend `yend' `=mdy(1,1,`Y')'
if `y' > 2010 local ymid `ymid' `=mdy(7,1, `y')' "`y'"
forval q = 1/4 {
if `q' > 4 | `y' > 2010 {
local qmid : word `q' of 2 5 8 11
local qmids `qmids' `=mdy(`qmid', 15, `y')' "Q`q'"
local qend : word `q' of 4 7 10 4
local qends `qends' `=mdy(`qend', 1, `y')'
}
}
}
line M S ddate, xla(`ymid', tlength(*3) tlc(none)) xtic(`yend', tlength(*5)) xmla(`qmids', tlc(none) labsize(small) tlength(*.5)) xmti(`qends', tlength(*5)) xtitle("") scheme(s1color)

Related

Calculate aggregate value of last 12 months in a Measure Power BI

I'm using this measure to calculate a aggregate sum of a value in the last 12 months. The measure is working well if I start using it from the month 12. But, the problem is, if the month is not in the 12 or higher, the value is not right.
Example, if you are in the first month of the sample, I would like to multiply this value by 12 (1st month + 11 months). If it was the second month, I'd like you to average the two months and multiply it by 12. And so on.
could you please help me?
SumRevenue =
var vSumNet12 =
CALCULATE(
Table[Trevenue],
DATESINPERIOD(
CalendarM[Data],
MAX(CalendarM[Data]),
-12,
MONTH
)
)
return
vSumNet12
Example table:
Date Customer Net Trevenue SumRevenue ROA ROA I Want
09/30/20 A 237767115,6 327444,2478 327444,2478 0,14% 1,65%
10/31/20 A 245689276,3 251934,78 579379,0278 0,24% 1,41%
11/30/20 A 252916933,6 262294,89 841673,9178 0,33% 1,33%
12/31/20 A 241424127 509883,07 1351556,988 0,56% 1,68%
01/31/21 A 244721140,9 259250 1610806,988 0,66% 1,58%
02/28/21 A 250913741,4 246740,33 1857547,318 0,74% 1,48%
03/31/21 A 282215365,7 550897,35 2408444,668 0,85% 1,46%
04/30/21 A 312759343,1 544161,63 2952606,298 0,94% 1,42%
05/31/21 A 325535894 419360,97 3371967,268 1,04% 1,38%
06/30/21 A 371306315 390650,41 3762617,678 1,01% 1,22%
07/31/21 A 379780645,3 527254,43 4289872,108 1,13% 1,23%
08/31/21 A 415390274,9 409196,3 4699068,408 1,13% 1,13%
09/30/21 A 433837730,6 598924,02 4970548,18 1,15% 1,15%
10/31/21 A 482659906,7 254086,32 4972699,72 1,03% 1,03%
11/30/21 A 501568104,7 318924,53 5029329,36 1,00% 1,00%
12/31/21 A 507124350,5 754897,79 5274344,08 1,04% 1,04%
01/31/22 A 510220304,2 179153,11 5194247,19 1,02% 1,02%

Dropping entire subject if single observation meets criterion in panel data

I have some panel data of the form...
id | amount
-----------
1 | 10
1 | 10
1 | 100
2 | 10
2 | 15
2 | 10
3 | 100
What I'm looking to do seems like it should be fairly simple, but my experience with Stata is limited and I'm used to programming in languages similar to C/Java. Essentially, I want to drop an entire person (id) if any of their individual observations ever exceed a certain amount. So let's say I set this amount to 50, I want to drop all the observations from id 1 and id 3 such that the data will then only contain observations from id 2.
The pseudo-code in Java would be fairly straightforward...
for(int i = 0; i < dataset_length; i++) {
if dataset[i].amount > 50 {
int drop_id = dataset[i].id;
for(int j = 0; j < dataset_length; j++) {
if dataset[j].id == drop_id {
delete observation
}
}
}
}
What would the Stata equivalent of something akin to this be? I'm surely missing something and making it more complicated than it ought to be, but I cannot figure it out.
If there are no missings on amount this is just
bysort id (amount) : drop if amount[_N] > 50
If there are missings, then
gen ismissing = -missing(amount)
bysort id (ismissing amount): drop if amount[_N] > 50 & amount[_N] < .
would be one kind of check, although it's hard to see how the missings could be interesting or useful.
The machinery here (for one introduction see here) in effect builds in a loop over identifiers, and over the observations for each identifier. Literal translation using models from mainstream programming languages could only result in lengthier and less efficient code.

Rolling Standard Deviation

I use Stata for estimating rolling standard deviation of ROA (using 4 window in previous year). Now, I would like to keep only those rolling standard deviation that has at least 3 observation (out of 4) in the ROA. How can I do this using Stata?
ROA roa_sd
. .
. .
. .
.0108869 .
.0033411 .
.0032814 .0053356 (this value should be missing as it was calculated using only 2 valid value)
.0030827 .0043739
.0029793 .0038275
Your question is answered on the blog post I link to above in the comments. You can use rolling and then add an additional screen to discard sigma when the number of observations doesn't meet your threshold.
But for simple calculations like sigma and beta (i.e., standard deviation and univariate regression coefficient) you can do much better with a more manual approach. Compare the rolling solution with my manual solution.
/* generate panel by adpating the linked code */
clear
set obs 20000
gen date = _n
gen id = floor((_n - 1) / 20) + 1
gen roa = int((100) * runiform())
replace roa = . in 1/4
replace roa = . in 10/12
replace roa = . in 18/20
/* solution with rolling */
/* http://statadaily.wordpress.com/2014/03/31/rolling-standard-deviations-and-missing-observations/ */
timer on 1
xtset id date
rolling sd2 = r(sd), window(4) keep(date) saving(f2, replace): sum roa
merge 1:1 date using f2, nogenerate keepusing(sd2)
xtset id date
gen tag = missing(l3.roa) + missing(l2.roa) + missing(l1.roa) + missing(roa) > 1
gen sd = sd2 if (tag == 0)
timer off 1
/* my solution */
timer on 2
rolling_sd roa, window(4) minimum(3)
timer off 2
/* compare */
timer list
list in 1/50
I show the manual solution is much faster.
. /* compare */
. timer list
1: 132.38 / 1 = 132.3830
2: 0.10 / 1 = 0.0990
Save the following as rolling_sd.ado in your personal ado file directory (or in your current working directory). I'm sure that someone could further streamline this code. Note that this code has the additional advantage of meeting the minimum data requirements at the front edge of the window (i.e., calculates sigma with first three observations, rather than waiting for all four).
*! 0.2 Richard Herron 3/30/14
* added minimum data requirement
*! 0.1 Richard Herron 1/12/12
program rolling_sd
version 11.2
syntax varlist(numeric), window(int) minimum(int)
* get dependent and indpendent vars from varlist
tempvar n miss xs x2s nonmiss1 nonmiss2 sigma1 sigma2
local w = `window'
local m = `minimum'
* generate cumulative sums and missing values
xtset
bysort `r(panelvar)' (`timevar'): generate `n' = _n
by `r(panelvar)': generate `miss' = sum(missing(`varlist'))
by `r(panelvar)': generate `xs' = sum(`varlist')
by `r(panelvar)': generate `x2s' = sum(`varlist' * `varlist')
* generate variance 1 (front of window)
generate `nonmiss1' = `n' - `miss'
generate `sigma1' = sqrt((`x2s' - `xs'*`xs'/`nonmiss1')/(`nonmiss1' - 1)) if inrange(`nonmiss1', `m', `w') & !missing(`nonmiss1')
* generate variance 2 (back of window, main part)
generate `nonmiss2' = `w' - s`w'.`miss'
generate `sigma2' = sqrt((s`w'.`x2s' - s`w'.`xs'*s`w'.`xs'/`nonmiss2')/(`nonmiss2' - 1)) if inrange(`nonmiss2', `m', `w') & !missing(`nonmiss2')
* return standard deviation
egen sigma = rowfirst(`sigma2' `sigma1')
end

Use carryforward with dynamic condition to limit carry forward time interval

I am using carryforward (ssc install carryforward) to fill in missing observations. Some of my data are annual and I want to use them for subsequent monthly observations, but only if the carried forward data are less than two years old. Can I achieve this logic with the dynamic_condition() option, particularly using #? I have to complete this for many variables, and would like to avoid a lot of variable generation and dropping (and really I'd like to know if it's possible).
The following "manual" solution works, but can I replicate it on the fly with dynamic_condition()? My attempts below fail.
/* generate data with observation every June */
clear
set obs 100
generate date_ym = ym(2001, 1) + (_n - 1)
format date_ym %tm
generate date_m = month(dofm(date_ym))
generate x = runiform() if (date_m == 6) & !inlist(_n, 30, 42)
/* carryforward (ssc install carryforward), "manual" solution */
egen date_m2 = group(date_ym) if !missing(x)
carryforward date_m2, replace
bysort date_m2 (date_ym): generate date_m3 = cond(_n > 24, ., date_m2)
carryforward x if !missing(date_m3), gen(x_cf)
tsset date_ym
list, sep(12)
/* can I replicate this with dynamic_condition() option? */
/* no time series operators with # */
/* carryforward x, gen(x_cf2) dynamic_condition(sum(d.# == 0) < 24) */
/* x_cf2: d.x_cf2 invalid name */
/* second # doesn't work */
/* carryforward x, gen(x_cf3) dynamic_condition(sum(# == #[_n - 1]) < 24) */
/* x_cf3: equation [_n-1] not found */
Disclosure: I don't use carryforward (SSC), but that's because I tend to think back to the principles as I understand them, as documented here.
To do this, you need to keep a record not only of previous non-missing values but also of the dates when a variable was last not missing. This arose previously: see this answer
The essence of a simpler approach is here:
clear
set seed 2803
set obs 100
generate date_ym = ym(2001, 1) + (_n - 1)
format date_ym %tm
generate x = runiform() if inlist(_n, 30, 42)
gen last = date_ym if !missing(x)
replace last = last[_n-1] if missing(last)
replace x = x[_n-1] if missing(x) & (date_ym - last) < 24
The generalisation to panels is using by: and the generalisation to multiple variables uses a foreach loop. If the dates of missing values can be different for different variables, that mostly just shifts the loop.
Schematically, suppose we are cycling over an arbitary varlist and that the dates of missing values differ, but we use the rule of using the last value within 24 months.
gen last = .
quietly foreach v of varlist <varlist> {
replace last = cond(!missing(`v'), date_ym, .)
replace last = last[_n-1] if missing(last)
replace `v' = `v'[_n-1] if missing(`v') & (date_ym - last) < 24
}

Is Conger's kappa available in Stata?

Is the modified version of kappa proposed by Conger (1980) available in Stata? Tried to google it to no avail.
This is an old question, but in case anyone is still looking--the SSC package kappaetc now calculates that, along with every other inter-rater statistic you could ever want.
Since no one has responded with a Stata solution, I developed some code to calculate Conger's kappa using the formulas provided in Gwet, K. L. (2012). Handbook of Inter-Rater Reliability (3rd ed.), Gaithersburg, MD: Advanced Analytics, LLC. See especially pp. 34-35.
My code is undoubtedly not as efficient as others could write, and I would welcome any improvements to the code or to the program format that others wish to make.
cap prog drop congerkappa
prog def congerkappa
* This program has only been tested with Stata 11.2, 12.1, and 13.0.
preserve
* Number of judges
scalar judgesnum = _N
* Subject IDs
quietly ds
local vlist `r(varlist)'
local removeit = word("`vlist'",1)
local targets: list vlist - removeit
* Sums of ratings by each judge
egen judgesum = rowtotal(`targets')
* Sum of each target's ratings
foreach i in `targets' {
quietly summarize `i', meanonly
scalar mean`i' = r(mean)
}
* % each target rating of all target ratings
foreach i in `targets' {
gen `i'2 = `i'/judgesum
}
* Variance of each target's % ratings
foreach i in `targets' {
quietly summarize `i'2
scalar s2`i'2 = r(Var)
}
* Mean variance of each target's % ratings
foreach i in `targets' {
quietly summarize `i'2, meanonly
scalar mean`i'2 = r(mean)
}
* Square of mean of each target's % ratings
foreach i in `targets' {
scalar mean`i'2sq = mean`i'2^2
}
* Sum of variances of each target's % ratings
scalar sumvar = 0
foreach i in `targets' {
scalar sumvar = sumvar + s2`i'2
}
* Sum of means of each target's % ratings
scalar summeans = 0
foreach i in `targets' {
scalar summeans = summeans + mean`i'2
}
* Sum of meansquares of each target's % ratings
scalar summeansqs = 0
foreach i in `targets' {
scalar summeansqs = summeansqs + mean`i'2sq
}
* Conger's kappa
scalar conkappa = summeansqs -(sumvar/judgesnum)
di _n "Conger's kappa = " conkappa
restore
end
The data structure required by the program is shown below. The variable names are not fixed, but the judge/rater variable must be in the first position in the data set. The data set should not include any variables other than the judge/rater and targets/ratings.
Judge S1 S2 S3 S4 S5 S6
Rater1 2 4 2 1 1 4
Rater2 2 3 2 2 2 3
Rater3 2 5 3 3 3 5
Rater4 3 3 2 3 2 3
If you would like to run this against a test data set, you can use the judges data set from StataCorp and reshape it as shown.
use http://www.stata-press.com/data/r12/judges.dta, clear
sort judge
list, sepby(judge)
reshape wide rating, i(judge) j(target)
rename rating* S*
list, noobs
* Run congerkappa program on demo data set in memory
congerkappa
I have run only a single validation test of this code against the data in Table 2.16 in Gwet (p. 35) and have replicated the Conger's kappa = .23343 as calculated by Gwet on p. 34. Please test this code on other data with known Conger's kappas before relying on it.
I don't know if Conger's kappa for multiple raters is available in Stata, but it is available in R via the irr package, using the kappam.fleiss function and specifying the exact option. For information on the irr package in R, see http://cran.r-project.org/web/packages/irr/irr.pdf#page.12 .
After installing and loading the irr package in R, you can view a demo data set and Conger's kappa calculation using the following code.
data(diagnoses)
print(diagnoses)
kappam.fleiss(diagnoses, exact=TRUE)
I hope someone else here can help with a Stata solution, as you requested, but this may at least provide a solution if you can't find it in Stata.
In response to Dimitriy's comment below, I believe Stata's native kappa command applies either to two unique raters or to more than two non-unique raters.
The original poster may also want to consider the icc command in Stata, which allows for multiple unique raters.