I am running a logistic regression in SAS. SAS output odds ratio estimates with point estimate and 95% confidence limit. How can I output standard error in odds ratio output (not in parameter estimates) together with point estimate and 95% confidence limit
proc logistic data=data1;
class rank / param=ref;
model admit = gre gpa rank;
run;
This is the output for odds ratio estimate, how can I make SAS add standard error for odds ratio estimate?
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
GRE 1.002 1.000 1.004
GPA 2.235 1.166 4.282
RANK 1 vs 4 4.718 2.080 10.701
RANK 2 vs 4 2.401 1.170 4.927
RANK 3 vs 4 1.235 0.572 2.668
Thanks in advance for any help
Related
I'm using the ModelChain class to estimate DC and AC values for a fictitious solar plant. Input parameters include module, inverter, number of strings, number of modules, number of inverters, albedo, PVGIS TMY data, etc. I apply simple math to calculate number of modules per string and number of strings per inverter then I create one PVSystem object consisting of a single PVArray, per inverter. The I run the ModelChain model for each inverter and, for simplicity, add up the AC output to estimate the total AC for all arrays like this:
for idx in range(0, num_of_inverters):
array = {
'name': f'pvsystem-{idx+1}-array',
'mount': mount,
'module': module_name,
'module_parameters': module_parameters,
'module_type': module_type,
'albedo': albedo,
'strings': strings_per_inverter,
'modules_per_string': modules_per_string,
'temperature_model_parameters': temperature_model_parameters,
}
pvsystem=pvlib.pvsystem.PVSystem(arrays=[pvlib.pvsystem.Array(**array)], inverter_parameters=inverter_parameters)
mc = pvlib.modelchain.ModelChain(pvsystem, location)
mc.run_model(tmy_weather)
total_ac += mc.results.ac.sum()
According to PVLib documentation, the AC output is yearly in Watts hour.
But now I need to get the DC output as well (yearly in Watts hours) so I can calculate the DC/AC ratio. Running mc.results.dc gives me a Dataframe with several values (columns) that are hard to grasp for a newbie like me:
i_sc : Short-circuit current (A)
i_mp : Current at the maximum-power point (A)
v_oc : Open-circuit voltage (V)
v_mp : Voltage at maximum-power point (V)
p_mp : Power at maximum-power point (W)
i_x : Current at module V = 0.5Voc, defines 4th point on I-V curve for modeling curve shape
i_xx : Current at module V = 0.5(Voc+Vmp), defines 5th point on I-V curve for modeling curve shape
I tried using p_mp and adding it up: mc.results.dc['p_mp'].sum() but the output is much bigger than the estimated AC. I usually expect the DC/AC ratio to be somewhere > 1 and <= 1.5, roughly. However, I'm getting DC values that are like 3-5 times bigger which probably means I'm doing something wrong.
Example: 1 string, 1 inverter, 10 modules per string:
Output (yearly):
AC: 869.61kW
DC: 3326.36kW
Ratio: 3.83
Any help is appreciated.
As for why the total DC and AC generation values are so different, it's because the inverter is way undersized for the array. The inverter is rated for 250 W maximum, which is not much more than what a single module produces at STC (calculate by Impo * Vmpo as below, or noticing the "220" in the module name), and you have ten modules total. So the inverter will be saturated at even very low light, and the total AC production will be severely curtailed as a result. I think if you make a plot (mc.results.ac.plot()) you will see that the daily inverter output curve is clipped at 250 W while the simulated DC power can be nearly 10x higher. It's always a good idea to plot your time series when things aren't making sense!
In [23]: pvlib.pvsystem.retrieve_sam('cecinverter')['ABB__MICRO_0_25_I_OUTD_US_208__208V_']['Paco']
Out[23]: 250.0
In [24]: pvlib.pvsystem.retrieve_sam('sandiamod')['Canadian_Solar_CS5P_220M___2009_'][['Impo', 'Vmpo']]
Out[24]:
Impo 4.54629
Vmpo 48.3156
Name: Canadian_Solar_CS5P_220M___2009_, dtype: object
A couple other notes:
Please be careful about units:
Summing (really, integrating) an hourly time series of power (Watts) produces energy (Watt-hours). An annual output in kW doesn't make sense, since kW is for power and power is an instantaneous rate of energy generation. If this is new to you, it might be helpful to think about speed vs distance: a car might be traveling at 60mph at any given time point, but the total distance it travels in a year is measured in miles, not mph. Power is energy per unit time just like speed is distance per unit time.
Summing voltages (num_of_inverters * mc.results.dc['v_mp'].sum()) makes no sense that I can see. Volt-hours doesn't seem like a useful unit to me outside of some very specialized power electronics engineering contexts.
The term "DC/AC ratio" is typically understood to mean the ratio of rated capacities, not annual productions. So for the example in your gist, the DC/AC ratio would be calculated as (220 W/module * 10 modules/string * 2 strings/inverter = 4400 W DC) / (250 W AC) = 17.6 (which is a crazy DC/AC ratio).
I'm using SAS to plot an histogram with the Kernel density. In the documentation, it is specified that we can choose the parameter c: "the standardized bandwidth for a number that is greater than 0 and less than or equal to 100." But I cannot find the default value used to create the following plot.
Does someone have an idea? Thanks!
SGPLOT minimizes the Asymptotic Mean Integrated Square Error (AMISE) for the kernel density function. According to PROC UNIVARIATE, which also can do KDE:
By default, the procedure uses the AMISE method to compute kernel density estimates.
PROC UNIVARIATE documentation
We can confirm that they both have the same default by comparing the output.
proc univariate data=sashelp.cars;
var horsepower;
histogram / kernel;
run;
In the log, we find:
NOTE: The normal kernel estimate for c=0.7852 has a bandwidth of 21.035 and an AMISE of 392E-7.
Let's plot them together and compare the values.
proc sgplot data=sashelp.cars;
density horsepower/TYPE=KERNEL;
density horsepower/TYPE=KERNEL(c=0.7852);
ods output sgplot;
run;
data diff;
set sgplot;
abs_diff = abs(KERNEL_Horsepower____Y - KERNEL_Horsepower_C_0_7852____Y);
run;
proc univariate data=diff;
var abs_diff;
run;
The average difference between all points plotted is 1.65x10^-9, with the overall largest being 6.76x10^-9. This is, essentially, zero. The reason for the differences is that the c-value given to the user in the log is lower precision than the one calculated by proc sgplot. You can get a higher precision estimate with the outkernel= option in proc univariate as well.
I have created two graphs using density functions but don't know how to calculate the overlap coefficient in SAS using the following codes:
proc sgplot data=combined_cohort;
density S_exposed / legendlabel='exposed' lineattrs=(pattern=solid);
density S_unexposed / legendlabel='unexposed' lineattrs=(pattern=solid);
keylegend / location=inside position=topright across=1;
xaxis display=(nolabel);
The seeds of the garden pea are either yellow or green. A certain cross between pea plants produces progeny where 75% are plants with yellow seeds and 25% are plants with green seeds. What is the minimum number of progeny you would need to grow to have probability no less than 0.99 of obtaining at least 10 plants with green seeds?
I understand how to estimate a required sample size when I have data such as standard deviation, mean, correlation, etc., but I don't even know where to start to estimate it based on the percentage values with a certain probability.
So far I set up this code in SAS:
Proc power;
onesamplefreq test=Z method=normal
sides=1
alpha=.01
nullproportion=.5
proportion=.25
power=.99
ntotal= .;
run;
Running this program resulted in a sample size of 76, but I don't feel like this is correct. I don't know how to specify that I need at least 10 plants with green seeds, and I don't know how to set the nullproportion or if it matters.
It is a Binomial distribution kind problem. Where chance of winning (green plant) is 25%. You want to win at least 10 times, so how many times you need to play (that is, how many seeds you need)?
Mean of binomial distribution will answer this question which is:
np = 10
n*0.25 = 10
n = 40
So required seed is 40. This is purely probabilistic. But we need to consider Type I and Type II error. So sample size 76 seems reasonable to me.
I'm trying to implement a RBM and I'm testing it on MNIST dataset. However, it does not seems to converge.
I've 28x28 visible units and 100 hidden units. I'm using mini-batches of size 50. For each epoch, I traverse the whole dataset. I've a learning rate of 0.01 and a momentum of 0.5. The weights are randomly generated based on a Gaussian distribution of mean 0.0 and stdev of 0.01. The visible and hidden biases are initialized to 0. I'm using a logistic sigmoid function as activation.
After each epoch, I compute the average reconstruction error of all mini-batches, here are the errors I get:
epoch 0: Reconstruction error average: 0.0481795
epoch 1: Reconstruction error average: 0.0350295
epoch 2: Reconstruction error average: 0.0324191
epoch 3: Reconstruction error average: 0.0309714
epoch 4: Reconstruction error average: 0.0300068
I plotted the histograms of the weights to check (left to right: hiddens, weights, visibles. top: weights, bottom: updates):
Histogram of the weights after epoch 3
Histogram of the weights after epoch 3 http://baptiste-wicht.com/static/finals/histogram_epoch_3.png
Histogram of the weights after epoch 4
Histogram of the weights after epoch 4 http://baptiste-wicht.com/static/finals/histogram_epoch_4.png
but, except for the hidden biases that seem a bit weird, the remaining seems OK.
I also tried to plot the hidden weights:
Weights after epoch 3
Weights after epoch 3 http://baptiste-wicht.com/static/finals/hiddens_weights_epoch_3.png
Weights after epoch 4
Weights after epoch 4 http://baptiste-wicht.com/static/finals/hiddens_weights_epoch_4.png
(they are plotted in two colors using that function:
static_cast<size_t>(value > 0 ? (static_cast<size_t>(value * 255.0) << 8) : (static_cast<size_t>(-value * 255.)0) << 16) << " ";
)
And here, they do not make sense at all...
If I go further, the reconstruction error falls a bit more, but do no go further than 0.025. Even if I change the momentum after sometime, it goes higher and then goes down a bit but not interestingly. Moreover, the weights do no make more sense after more epochs. In most example implementations I've seen, the weights were making some sense after iterating through the complete data set two or three times.
I've also tried to reconstruct an image from the visible units, but the results seems almost random.
What could I do to check what goes wrong in my implementation ? Should the weights be within some range ? Does something seems really strange in the data ?
Complete code: https://github.com/wichtounet/dbn/blob/master/include/rbm.hpp
You are using a very small learning rate. In most NNs trained by SGD you start out with a higher learning rate and decay it over time. Search for learning rate or adaptive learning rate to find more information on that.
Second, when implementing a new algorithm I would recommend finding the paper that introduced it and reproducing their results. A good paper should include most of the settings used - or the method used to determine the settings.
If a paper is unavailable, or it was tested on a data set you don't have access to - go find a working implementation and compare the outputs when using the same settings. If the implementations are not feature compatible, turn off as many features as you can that are not shared.