Title of second y-axis in stata - stata

I am trying to create a graph with two y axes (below code), but the title of the right-side y-axis does not appear. Can anyone please help me with that?
sysuse auto, clear
generate kpl=mpg*0.425144
twoway (scatter mpg weight, color(navy) yaxis(1)) (scatter kpl weight, color(navy) yaxis(2) ylabel(4.25 8.5 12.75 17, axis(2)) ytitle(Kilometres per Litre, axis(2))), by(foreign, legend(off) note(Graphs by Car origin))
enter image description here

I think I understand what you want but I would approach it quite differently.
If you want a second scale of km per litre to compare with miles per gallon, that is just the same data points explained differently, just as you could show Celsius and Fahrenheit temperatures on different axes or calculate proportions and show percents, or vice versa.
Another variable holding km per litre makes this more difficult, not easier, as the values differ by the corresponding conversion factor.
Here I use mylabels from SSC, which must be installed before you can use it.
Naturally you don't need to show zero, but the identity 0 miles per gallon = 0 km per litre may make the point easier to follow.
sysuse auto, clear
set scheme s1color
mylabels 0(4)16, myscale(#/0.425144) local(yla)
scatter mpg weight, yaxis(1 2) yla(`yla', axis(2) ang(h)) yla(0(10)40, axis(1) ang(h)) ytitle(km per litre, axis(2)) ms(Oh)

Related

How can I estimate the DC output of a solar plant consisting of multiple modules and inverters in PVLib?

I'm using the ModelChain class to estimate DC and AC values for a fictitious solar plant. Input parameters include module, inverter, number of strings, number of modules, number of inverters, albedo, PVGIS TMY data, etc. I apply simple math to calculate number of modules per string and number of strings per inverter then I create one PVSystem object consisting of a single PVArray, per inverter. The I run the ModelChain model for each inverter and, for simplicity, add up the AC output to estimate the total AC for all arrays like this:
for idx in range(0, num_of_inverters):
array = {
'name': f'pvsystem-{idx+1}-array',
'mount': mount,
'module': module_name,
'module_parameters': module_parameters,
'module_type': module_type,
'albedo': albedo,
'strings': strings_per_inverter,
'modules_per_string': modules_per_string,
'temperature_model_parameters': temperature_model_parameters,
}
pvsystem=pvlib.pvsystem.PVSystem(arrays=[pvlib.pvsystem.Array(**array)], inverter_parameters=inverter_parameters)
mc = pvlib.modelchain.ModelChain(pvsystem, location)
mc.run_model(tmy_weather)
total_ac += mc.results.ac.sum()
According to PVLib documentation, the AC output is yearly in Watts hour.
But now I need to get the DC output as well (yearly in Watts hours) so I can calculate the DC/AC ratio. Running mc.results.dc gives me a Dataframe with several values (columns) that are hard to grasp for a newbie like me:
i_sc : Short-circuit current (A)
i_mp : Current at the maximum-power point (A)
v_oc : Open-circuit voltage (V)
v_mp : Voltage at maximum-power point (V)
p_mp : Power at maximum-power point (W)
i_x : Current at module V = 0.5Voc, defines 4th point on I-V curve for modeling curve shape
i_xx : Current at module V = 0.5(Voc+Vmp), defines 5th point on I-V curve for modeling curve shape
I tried using p_mp and adding it up: mc.results.dc['p_mp'].sum() but the output is much bigger than the estimated AC. I usually expect the DC/AC ratio to be somewhere > 1 and <= 1.5, roughly. However, I'm getting DC values that are like 3-5 times bigger which probably means I'm doing something wrong.
Example: 1 string, 1 inverter, 10 modules per string:
Output (yearly):
AC: 869.61kW
DC: 3326.36kW
Ratio: 3.83
Any help is appreciated.
As for why the total DC and AC generation values are so different, it's because the inverter is way undersized for the array. The inverter is rated for 250 W maximum, which is not much more than what a single module produces at STC (calculate by Impo * Vmpo as below, or noticing the "220" in the module name), and you have ten modules total. So the inverter will be saturated at even very low light, and the total AC production will be severely curtailed as a result. I think if you make a plot (mc.results.ac.plot()) you will see that the daily inverter output curve is clipped at 250 W while the simulated DC power can be nearly 10x higher. It's always a good idea to plot your time series when things aren't making sense!
In [23]: pvlib.pvsystem.retrieve_sam('cecinverter')['ABB__MICRO_0_25_I_OUTD_US_208__208V_']['Paco']
Out[23]: 250.0
In [24]: pvlib.pvsystem.retrieve_sam('sandiamod')['Canadian_Solar_CS5P_220M___2009_'][['Impo', 'Vmpo']]
Out[24]:
Impo 4.54629
Vmpo 48.3156
Name: Canadian_Solar_CS5P_220M___2009_, dtype: object
A couple other notes:
Please be careful about units:
Summing (really, integrating) an hourly time series of power (Watts) produces energy (Watt-hours). An annual output in kW doesn't make sense, since kW is for power and power is an instantaneous rate of energy generation. If this is new to you, it might be helpful to think about speed vs distance: a car might be traveling at 60mph at any given time point, but the total distance it travels in a year is measured in miles, not mph. Power is energy per unit time just like speed is distance per unit time.
Summing voltages (num_of_inverters * mc.results.dc['v_mp'].sum()) makes no sense that I can see. Volt-hours doesn't seem like a useful unit to me outside of some very specialized power electronics engineering contexts.
The term "DC/AC ratio" is typically understood to mean the ratio of rated capacities, not annual productions. So for the example in your gist, the DC/AC ratio would be calculated as (220 W/module * 10 modules/string * 2 strings/inverter = 4400 W DC) / (250 W AC) = 17.6 (which is a crazy DC/AC ratio).

Controlling layout of multiple histograms

I have used the by option of the histogram command to group my two categorical variables:
sysuse auto, clear
hist price, percent by(foreign rep78)
I would like to graph 8 histograms together on the same plot, with the following layout:
XXXX
XXXX
However, as the above graph shows Stata places the histograms by default as follows:
XXX
XXX
XX
How can I achieve a 2x4 orientation of histograms?
I have played around with the aspectratio() option, but this changes the aspect ratio of the individual graphs, not of the whole plot.
You need to specify the cols() sub-option in by():
sysuse auto, clear
histogram price, percent by(foreign rep78, cols(4))

Creating box plots of the gap between two groups by deciles

I work with Stata and I have math grades for two different groups: A and B.
I want to see the gap that exists between both groups in each decile. In addition I want to do a box plot of this gap for each decile (I want to have 10 box plots, one for each decile which shows the gap between group grades).
What I first did was to compute the deciles using xtile for both groups:
xtile decileA= mat if group==1, nq(10)
xtile decileB= mat if group==0, nq(10)
However, groups A and B do not have the same number of observations nor the same distribution. I thought of computing quantiles for each decile and group and subtracting them to get the difference in each decile at each quartile to create the boxplot. But I do not know how to proceed afterwards to create the graph, and given that I have a different number of observations in each group decile I do not know if it is correct to proceed this way.
If I try to use the pctile command and compute the difference at each decile, I lose all the variance in the data inside each decile. I only get median differences and not all the quantiles I want.
Example:
pctile decileA= mat if group==1, nq(10)
pctile decileB= mat if group==0, nq(10)
gen qdiff= decileA- decileB if _n<10
gen qtau=_n/10 if _n<10
graph box qdiff, over(tau)
I want to know if there is a way to do the graph I am intending to?
Cross-posted on Statalist.
There is certainly a way to accomplish what you want with a bit of effort, but if the goal is to make a comparison between the two groups at each decile with some notion of variability, you can easily get that from a simultaneous quantile regression and the SEs that it produces:
sysuse auto, clear
sqreg price i.foreign, quantile(.1 .2 .3 .4 .5 .6 .7 .8 .9)
margins, dydx(foreign) ///
predict(outcome(q10)) ///
predict(outcome(q20)) ///
predict(outcome(q30)) ///
predict(outcome(q40)) ///
predict(outcome(q50)) ///
predict(outcome(q60)) ///
predict(outcome(q70)) ///
predict(outcome(q80)) ///
predict(outcome(q90)) ///
post
marginsplot, yline(0) xlab(, grid) ylab(#10, grid angle(90))
This yields a graph showing that foreign origin is associated with a bigger price at higher deciles, with the exception of the top decile, though none of the differences are probably significant here given how much the CIs overlap:
You can even conduct formal hypothesis tests that the effects are equal like this:
. test _b[1.foreign:9._predict] = _b[1.foreign:8._predict]
( 1) - [1.foreign]8._predict + [1.foreign]9._predict = 0
chi2( 1) = 3.72
Prob > chi2 = 0.0537
With 74 cars, we cannot reject that the effect on the 80th and 90th percentile are the same even though the point estimates have the opposite signs but similar magnitude.

Stata: Storing only part of a FE regression output for graphing

I am running a regression with two fixed effects categories (country and year, is economic macro data). Since I am using xtreg, one is autohid, but the other is a variable:
xtreg fiveyearyg taxratio i.year if taxratiocut == 1, i(wbcode1) fe cluster(wbcode1)
estimates store yi
I am running a number of these and I want to graph the coefficients for taxratio from each. But when I store the data, it stores both the taxratio coefficient, and the 50+ coefficients for the year fixed effects.
After a lot of searching, I cannot find any way to store (or recall) just part of the regression output, the one coefficient (with SEs) that I care about. Does anyone know a way to do that?
Here is how you can do that:
webuse grunfeld,clear
qui xtreg mvalue invest i.year,fe cluster(company)
//e(b) stores coefficient matrix and e(V) stores variance-covariance matrix. For details type: ereturn list after running the model
//let's say you want to extract only the coefficient on invest
mat coef_matrix=e(b)
scalar coef_invest=coef_matrix[1,1]
dis coef_invest
1.7178414
//to extract se of the the coefficient on invest
mat var_matrix=e(V)
mat diag_var_matrix=vecdiag(var_matrix) //diagonal elements are variances and the standard errors are square roots of these variances
matmap diag_var_matrix se_matrix , m(sqrt(#))) //you need to install matmap using ssc install matmap, you will get error if variance is negative
scalar se_invest=se_matrix[1,1]
dis se_invest
.14082153
Accessing coefficients is as easy as calling _b[varname]; analogously the corresponding standard errors: _se[varname].
An example:
webuse grunfeld, clear
qui xtreg mvalue invest i.year,fe cluster(company)
// coef for invest
display _b[invest]
// std error for invest
display _se[invest]
// displayed results in matrix
matrix list r(table)
For multiple-equation models use [eqno]_b[varname] where the preceding bracket contains an equation number.
More detail can be found in [U] 13.5 Accessing coefficients and standard errors.
Starting Stata 12, estimation commands also store results in r() [and not just e()]. Notice I listed r(table), which contains most results displayed by the estimation command xtreg.
You show interest in plotting coefficients, so you should read on the user-written command coefplot. Run ssc install coefplot to download and help coefplot to get started. It has many options.
Edit
A complete example that plots only coefficients for invest (leaving out those for year), using coefplot, and based on conditional regressions is:
clear
set more off
webuse grunfeld
xtreg mvalue invest i.year if time <= 10,fe cluster(company)
estimates store before10
xtreg mvalue invest i.year if time > 10,fe cluster(company)
estimates store after10
coefplot before10 after10, keep(invest)

Stata's estout with two sets of margins

Suppose I have a model like this:
webuse nlswork
poisson hours i.union##c.tenure, robust
margins union, dydx(tenure)
margins rb1.union, dydx(tenure)
I would like to stack the two AMEs on top of the differences of the AMEs using Ben Jann's -estout-. Unfortunately, you need to post the margins results for estout, which interferes with the second margins command.
Is there any way around this?
Cross-posted at the Statalist forum for some time without an answer.
I've never used -estout-, but perhaps this will give you a start.
webuse nlswork
poisson hours i.union##c.tenure, robust
estimates store m0
margins union, dydx(tenure) post
estimates store m1
estimates restore m0
margins rb1.union, dydx(tenure) post
estimates store m2
Why this works: margins needs access to the results of the original command, poisson in this example. As margins does not itself leave estimation results behind, the original results remain available if you run margins without post, and you can have several margins commands in a row without problems. However, if you add the post option to the first margins command, the new posted results displace those in memory. In that case, the second margins will complain that
margins cannot work with its own posted results
The solution, therefore, is to present the second margins with the original estimation results, just what estimates restore is designed to do.
Update
r(table) contains all the results from margins, and the columns are named. Here's a version of Roberto's stacking solution that takes advantage of these properties:
use nlswork, clear,
poisson hours i.union##c.tenure, robust
margins union, dydx(tenure)
matrix list r(table)
matrix m1 = r(table)
matrix m11 = m1["b".."se", 1...]'
matrix m12 = m1["ll".."ul",1...]'
matrix first = m11,m12
margins rb1.union, dydx(tenure)
matrix m2 = r(table)
matrix m21 = m2["b".."se", 1...]'
matrix m22 = m2["ll".."ul",1...]'
matrix second = m21,m22
matrix rownames second = tenure:diff
matrix RESULTS = first \ second
estout matrix(RESULTS)
estout takes matrices, so maybe you can try with that:
webuse nlswork, clear
poisson hours i.union##c.tenure, robust
margins union, dydx(tenure)
matrix first = r(b)
matrix list first
margins rb1.union, dydx(tenure)
matrix second = r(b)
matrix list second
*-----
matrix b = first[1,1] , first[1,2] \ second[1,1] , .
estout matrix(b)
You would need to polish the results, of course.
Update
There's a thread on Statalist from 2007, where Ben Jann (the author of estout) clarifies that stacking multiple stored results into one column is not possible with estout alone. His solution involves a program that merges results manipulating matrices and column/row names.
For the example you have provided, something like the following works:
webuse nlswork, clear
poisson hours i.union##c.tenure, robust
// first margin
margins union, dydx(tenure)
matrix first = r(b)
// second margin
margins rb1.union, dydx(tenure)
matrix second = r(b)
matrix rownames second = tenure:diff
// put together
matrix c = first' \ second
estout matrix(c)
(The thread is a bit old so I'm not sure if estout has been updated to do this at present.)