Coefplot- Event study plot with two coefficients at time zero - stata

I have an event study I am plotting with coefplot in Stata 13. At time zero, I have two coefficients I would like to plot side by side, group A and group B. I don't know how to format the plot so that both coefficients show up side by side at time zero, without creating two separate plots or having a wide gap between them. Other than the two coefficients at time zero, there is only one coefficient at every other x-axis point. I would ideally like to label both group A and B at time zero with a different color but I can figure that out myself.
Here is the relevant code:
ppmlhdfe f2 ( 2.time 3.time 4.time 5.time zero c.groupA c.groupB 8.time 9.time 10.time 11.time 12.time )#(c.eventstudy_treat) , offset( log_pop_tt) a(i.unit i.month#i.year alltime#eventstudy_treat ) vce(cluster unit ) pformat(%5.4f) eform
Above, #.time is a dummy for each period in the event study where 7.time is "Time Zero". Period T-1 is a reference period represented by zero which is collinear and defaults to 1. groupA and groupB are dummies for treated group A and treated group B at time zero.
Below is my code for coefplot, where only group A is plotted at time zero:
coefplot, omitted keep(2.time#c.eventstudy_treat 3.time#c.eventstudy_treat 4.time#c.eventstudy_treat 5.time#c.eventstudy_treat 0.zero#c.eventstudy_treat c.groupA#c.eventstudy_treat 8.time#c.eventstudy_treat 9.time#c.eventstudy_treat 10.time#c.eventstudy_treat 11.time#c.eventstudy_treat 12.time#c.eventstudy_treat) vertical xlabel(1 "-5" 2 "-4" 3 "-3" 4 "-2" 5 "-1" 6 "0" 7 "1" 8 "2" 9 "3" 10 "4" 11 "5") baselevels eform order(2.time#c.eventstudy_treat 3.time#c.eventstudy_treat 4.time#c.eventstudy_treat 5.time#c.eventstudy_treat 0.zero#c.eventstudy_treat c.groupA#c.eventstudy_treat 8.time#c.eventstudy_treat 9.time#c.eventstudy_treat 10.time#c.eventstudy_treat 11.time#c.eventstudy_treat 12.time#c.eventstudy_treat) title("Event Study") xtitle("Relative Month") ytitle("Percentage Change") ciopts(recast(rcap)) transform(*=(#)-1) ylabel(-.06(.02).16,gmin gmax) yline(0, lpattern(dash) lcolor(gs0))
Picture is at:
https://i.stack.imgur.com/z4ZxW.png
How do I plot group B as well at time zero so that both groupA and groupB are plotted at time zero right next to each other? The group B coefficient is c.groupB#c.eventstudy_treat.

Related

Conditional formatting of a rectangle cell range defined by user input

In a Google Sheet with a cell range of 26x26 (so A1:Z26), I need to conditionally format (change the color to green) a rectangle area that is defined by user input.
Example of user input (4 values required):
hsize = 5 / vsize = 4 / starth = 3 / startv = 2
This means that the conditionally formatted area should be a rectangle from C2:G5 because the start cell horizontally is 3 (column C) and vertically 2 (row 2), and the size of the rectangle horizontally is 5 (C,D,E,F,G) and vertically 4 (2,3,4,5).
I already solved this with Apps Script but due to given restrictions I have to implement this without using any scripts.
I have numbered the whole 26x26 area (=sequence(26,26)) to get numbers from 1 to 676 that I could then use for the conditional formatting.
By doing this, I can limit the conditional formatting to the values between the start and the end value (in the example above that would be 29 (C2) and 111 (G5)). This works by using a simple and/if formula in the conditional formatting.
But the problem with this is that all the cells with values from 29 to 111 are now colored, not only the rectangle C2:G5.
I can't figure out how to define a formula that does what I need. How can I do this and limit the highlighted area to the defined cell range of the rectangle?
[Picture here]: green is the conditional formatting from 29 (C2) to 111 (G5), but what I actually need is that only the red-framed area should be shown in green.
try:
=REGEXMATCH(""&A1, "^"&TEXTJOIN("$|^", 1, INDIRECT(
ADDRESS($AB$4, $AB$3)&":"&ADDRESS($AB$2+$AB$4-1, $AB$1+$AB$3-1)))&"$")
or better:
=(COLUMN(A1)>=$AB$3) *(ROW(A1)>=$AB$4)*
(COLUMN(A1)<$AB$1+$AB$3)*(ROW(A1)<$AB$2+$AB$4)

Trajectory Analysis (SAS): Incorrect number of start values

I am attempting a trajectory analysis in SAS (proc traj).
Following instructions found online, I first begin by testing two quadratic models, then three, then four (i.e., order 2 2, order 2 2 2, order 2 2 2 2, order 2 2 2 2 2).
I determined that a three-group linear model is the best fit (order 1 1 1;)
I then wish to add time stable covariates with the risk command. As found online, I did this by adding the start parameters provided in the Log.
At this point, I receive a notice: "Incorrect number of start values. There should be 10 start values based on the model specifications.").
I understand that it's possible to delete some of the 12 parameter estimates provided - But how do I select which ones to remove?
Thank you.
Code:
proc traj data=followupyes outplot=op outstat=os out=of outest=oe itdetail;
id youthid;
title3 'linear 3-gp model ';
var pronoun_allpar1-pronoun_allpar3;
indep time1-time3;
model logit;
ngroups 3;
order 1 1 1;
weight wgt_00;
start 0.031547 0.499724 1.969017 0.859566 -1.236747 0.007471
0.771878 0.495458 0.000000 0.000000 0.000000 0.000000;
risk P00_45_1;
run;
%trajplot (OP, OS, "linear 3-gp model ", "Traj of Pronoun Support", "Pron Support", "Time");
Because you are estimating a model with 3 linear trajectories, you will need 2 start values for each of your 3 groups.
See here for more info: https://www.andrew.cmu.edu/user/bjones/example.htm

Pandas: Create New Dataframe that Counts Number of Times Keywords / Phrases From List Occur in One Column

I have the following word list:
list = ['clogged drain', 'right wing', 'horse', 'bird', 'collision light']
I have the following data frame (notice spacing can be weird):
ID TEXT
1 you have clogged drain
2 the dog has a right wing clogged drain
3 the bird flew into collision light
4 the horse is here to horse around
5 bird bird bird
I want to create a table that shows keywords and frequency counts of how often the keywords occurred in TEXT field. However, if a keyword appears more than once in the same row within the TEXT column, it is only counted once.
Desired output:
keywords count
clogged drain 2
right wing 1
horse 1
bird 2
collision light 1
I have searched all over stackoverflow but couldn't find my specific case.
I would start by reformatting the TEXT column to get rid of your funny spacing, using str.split() and str.join(). Then, use str.contains for each of your keywords, and get the sum of the boolean values that are outputted (It will return True if your keyword is found):
# Reformat text, splitting wherever you have one or more spaces
df['formatted_text'] = df.TEXT.str.split('\s+').str.join(' ')
# create your output dataframe
df2 = pd.DataFrame(my_list, columns=['keywords'])
# Count occurences:
df2['count'] = df2['keywords'].apply(lambda x: df.formatted_text.str.contains(x).sum())
The result:
>>> df2
keywords count
0 clogged drain 2
1 right wing 1
2 horse 1
3 bird 2
4 collision light 1
Just to note, I changed the variable name of your list to my_list, so as not to mask the built in python data type
You can using extractall
df.TEXT.str.extractall(r'({})'.format('|'.join(list)))[0].str.get_dummies().sum(level=0).gt(0).astype(int).sum()
Out[225]:
bird 2
clogged drain 2
collision light 1
horse 1
right wing 1
dtype: int64

Query on plotting Lorenz curves on Stata

I am trying to plot a lorenz curve, using the following command:
glcurve drugs, sortvar(death) pvar(rank) glvar(yord) lorenz nograph
generate rank1=rank
label variable rank "Cum share of mortality"
label variable rank1 "Equality Line"
twoway (line rank1 rank, sort clwidth(medthin) clpat(longdash))(line yord rank , sort clwidth(medthin) clpat(red)), ///
ytitle(Cumulative share of drug activity, size(medsmall)) yscale(titlegap(2)) xtitle(Cumulative share of mortality (2012), size(medsmall)) ///
legend(rows(5)) xscale(titlegap(5)) legend(region(lwidth(none))) plotregion(margin(zero)) ysize(6.75) xsize(6) plotregion(lcolor(none))
However, in the resultant curves, the Line of equality does not start from 0, is there a way to fix this?
Is it recommended to use the following in order to get the perfect 45 degree line of equality:
(function y=x, range(0 1)
Also, how many minimum observations are required to plot the above graph? Does it work well with 2 observations as well?
The reason your Line of Perfect Equality does not pass through (0,0) is because the values for your variable do not contain 0.
The smallest value you will have for rank will be 1/_N. Although this value will asymptotically approach 0, it will never actually reach 0.
To see this, try:
quietly sum rank
di r(min)
di 1/_N
Further, by applying the program code to your data (beginning around line 152 in the ado file and removing unnecessary bits), one can easily see that yord cannot take on a value of 0 without values of 0 for drugs:
glcurve drugs, sortvar(death) pvar(rank) glvar(yord) lorenz nograph
sort death drugs , stable
gen double rank1 = _n / _N
qui sum drugs
gen yord1= (sum(drugs) / _N) / r(mean)
The best way to plot your Equality would be the method from your edit, namely:
twoway(function y = x, ra(0 1))
One quick yet (very) crude fix to force the lorenz curve to start at the origin (if it doesn't already) is to add an observation to the data after obtaining rank and yord, and then deleting it after you have your curve:
glcurve drugs, sortvar(death) pvar(rank) glvar(yord) lorenz nograph
expand 2 in 1
replace yord = 0 in 1
replace rank = 0 in 1
twoway (function y = x, ra(0 1)) ///
(line yord rank)
drop in 1
Like I said, this is admittedly crude and even somewhat ill advised, but I can't see a much better alternative at the moment, and with this method you will not be altering any of the other values of yord by running glcurve on the extrapolated data.

Filter items with Django Query

I'm encountering this problem and would like to seek your help.
The context:
I'm having a bag of balls, each of which has an age (red and blue) and color attributes.
What I want is to get the top 10 "youngest" balls and there are at most 3 blue balls (this means if there are more than 3 blue balls in the list of 10 youngest balls, then replace the "redudant" oldest blue balls with the youngest red balls)
To get top 10:
sel_balls = Ball.objects.all().sort('age')[:10]
Now, to also satisfy the conditions "at most 3 blue balls", I need to process further:
Iterate through sel_balls and count the number of blue balls (= B)
If B <= 3: do nothing
Else: get additional B - 3 red balls to replace the oldest (B - 3) blue balls (and these red balls must not have appeared in the original 10 balls already taken out). I figure I can do this by getting the oldest age value among the list of red balls and do another query like:
add_reds = Ball.objects.filter(age >= oldest_sel_age)[: B - 3]
My question is:
Is there any way that I can satisfy the constraints in only one query?
If I have to do 2 queries, is there any faster ways than the one method I mentioned above?
Thanks all.
Use Q for complex queries to the database: https://docs.djangoproject.com/en/dev/topics/db/queries/#complex-lookups-with-q-objects
You should use annotate to do it.
See documentation.
.filter() before .annotate() gives 'WHERE'
.filter() after .annotate() gives 'HAVING' (this is what you need)