SAS proc ttest modify HO - sas

I have a data with GENDER=(1/0) INCOME SENIORITY=(1/0). I need to run a ttest on INCOME by GENDER for SENIORITY=1.
As far as I know, the default of HO=0, which means that there is no difference between the genders, but how can I define an HO that will check if mu of female(gender=1) is =higher= than mu of male(gender=0).
this is my basic code:
proc ttest data=l;
var INCOME;
class gender;
where SENIORITY=1;
run;

This becomes a one sided test, so you use the one tailed p-value. Please see the free SAS training course via SAS University Home Page that has the first statistical course, https://communities.sas.com/community/sas-analytics-u, see the training widget on the left handd side of the page.
If you have a SAS Communities ID and are logged in this link would take you directly to the page.
http://support.sas.com/ecst1
The sides option only affect the CI, the one tailed p-value is the same. The directionality is dependent on how you set up your hypothesis.
SIDES=L
specifies lower one-sided tests, in which the alternative hypothesis indicates a mean less than the null value, and lower
one-sided confidence intervals between minus infinity and the upper
confidence limit.
SIDES=U
specifies upper one-sided tests, in which the alternative hypothesis indicates a mean greater than the null value, and upper
one-sided confidence intervals between the lower confidence limit and
infinity.

Related

Independent variable to find seasonality effect?

I'm not sure if it's right to ask this here but any help greatly appreciated. I'm working on sas forecast studio.
This is my time series dataset (quarterly data):
Date e.g. 1-Jan-80, 1-Apr-80, 1-Jul-80
DateQ e.g. 1980Q1, 1980Q2, 1980Q3
Year e.g. 1980, 1981, 1982
GDP (dependable variable) e.g. 2650.1
T e.g. 1, 2, 3
Which of this variable, or should I create a new quarterly variable, to use as an independent variable for a linear regression to evaluate if there is any seasonal effect?
Seasonal effects should not be identified using simple linear regression on the time variable when analyzing time-series data. But, to answer your question, use date with the intnx() function to convert it to quarter.
data want;
format quarter yyq.;
set have;
quarter = intnx('quarter', date, 0, 'B');
run;
Seasonal effects can be identified a number of ways:
1. Graphing it
If a time series has a seasonal effect, it will tend to be clear. Simply looking at a graph of the data will let you know whether it is seasonal by your chosen interval.
In sashelp.air, it's very clear that there is a 12-month season.
2. Spectral Density Analysis
proc timeseries will give you a spectrum analysis to help identify significant seasons within the data. Peaks indicate possible cycles or seasons. You will need to do some filtering to a reasonable seasonal amount since the density may increase significantly after a certain point, and it is not representative of the true season.
Forecast Studio and Time Series Studio will do this for you and can give you similar output to the below.
proc timeseries data=sashelp.air
outspectra=outspectra;
id date interval=month;
var air;
spectra;
run;
proc sgplot data=outspectra;
where period BETWEEN 1 AND 24;
scatter x=period y=p;
series x=period y=p;
run;
We can see a strong indicator for a seasonality of 12. We also see some potential 3-month and 6-month cycles that could be tested within a model for significance.
3. ACF/PACF/IACF plots
Your ACF/PACF/IACF plots in Forecast Studio will also help you identify clear seasons.
The classic decaying suspension-bridge look is indicative of a seasonal effect. Note that the season increases around 12 and then decreases again. Additionally, the significant negative spike at 12 in the PACF and IACF plots are other indicators of a significant seasonal effect at 12.
Model Building and Testing
Tools like the seasonal augmented dickey fuller test that are available Forecast Studio can help you identify if you've captured seasonality and achieved stationarity after differencing.
The selection boxes in the Series view allow you to quickly add simple or seasonal differencing. Selecting (1) for simple differencing will add one simple difference. i.e:
y = y - lag(y)
Selecting (1) for seasonal differencing will add 1 seasonal difference. Note that when you create a project in Forecast Studio, the season is automatically diagnosed and assumed. This should be done after doing our diagnostics above for our best guess as to what the true season is. In our case, we've assumed our season is 12. This would be equivalent to:
y = y - lag12(y)
We can then use stationarity tests to ensure we've achieved stationarity. In our case, we'll add 1 simple and seasonal difference.
Notice how our white noise plot has improved and our spikes at 12 have decreased to non-significance. Additionally, our stationarity tests are looking good and significant - that is, there is no unit root present.
Adding Seasonal or Cyclical Effects
Your model choice will dictate how seasonal or cyclical effects are added. Differencing in an ARIMA model will take care of seasonality. Dummy variables can be used for additional cyclical effects in the ARIMA model. For example:
data want;
set have;
q1 = (qtr(date) = 1);
q2 = (qtr(date) = 2);
q3 = (qtr(date) = 3);
run;
UCMs can take care of all of these by adding both seasonal and cyclical effects. Holt-Winters ESMs take care of trend and seasonality without requiring dummy variables. Your modeling goals and performance considerations for each type of model will dictate which model you choose.

Detailed of predictions on proc logistic

I am implementing a logit model in a database of households using as dependent variable the classification of poor or not poor household (1 if it is poor, 0 if it is not):
proc logistic data=regression;
model poor(event="1") = variable1 variable2 variable3 variable4;
run;
Using the proc logistic in SAS, I obtained the table "Association of predicted probabilities and observed responses" that allows me to know the concordant percentage. However, I require detailed information of how many households are classified poor adequately, in this way:
I will appreciate your help with this issue.
Add the CTABLE option to your MODEL statement.
model poor(event="1") = variable1 variable2 variable3 variable4 / ctable;
CTABLE classifies the input binary response observations according to
whether the predicted event probabilities are above or below some
cutpoint value z in the range . An observation is predicted as an
event if the predicted event probability exceeds or equals z. You can
supply a list of cutpoints other than the default list by specifying
the PPROB= option. Also, you can compute positive and negative
predictive values as posterior probabilities by using Bayes’ theorem.
You can use the PEVENT= option to specify prior probabilities for
computing these statistics. The CTABLE option is ignored if the data
have more than two response levels. This option is not available with
the STRATA statement.
For more information, see the section Classification Table.

SAS propensity score matching: Observations considered for matching in PSMATCH is less than the total observations available in the data set

I am using SAS procedure PSMATCH to balance the cohorts. I am calculating the propensity score separately using logistic regression and then using the generated dataset in PSMATCH using PSDATA. I am doing multiple iterations of matching (to get the best results) by bringing variation in region, method (Optimal, Greedy and variable ratio), distance variable, caliper value and ratio. Please find the code below:
proc psmatch data=work.&data_set. region=&region_var.;
class &cat_var.;
psdata treatvar = case_cntrl_fl(Treated='1') PS=prop_score;
match method=&mtch_method.(&k_method.=&k_val.) exact= &.exact_mtch_var.
stat=&stat_var. caliper(mult=stddev)=&caliper_var.;
assess lps ps var=(prop_score &covar_asses.) / plots = (boxplot cloudplot);
output out(obs=match)=WORK.psm ps=ps lps=lps matchid=_MatchID matchwgt = _MATCHWGT_;
run;
My concern is regarding the number of observation considered for matching (i.e. All Observations). The total observation logistic regression data set are Treatment Arm 1: 531 and Treatment Arm 2: 3252 However, in PSMATCH report All observations reported as Treatment Arm 1: 446 and Treatment Arm 2: 2784 The result is consistent irrespective of the variations in PSMATCH methods
Can somebody help me understand the possible reason of drop in counts?
You likely have missing values in your data. If any variable in the proc is missing, that entire row is excluded from the analysis overall.

continuous variable as random effect in proc mixed

I could use some help understanding the interpretation of a continuous variable random effect. I understand the interpretation of fixed effect and categorical random effect. But SAS seems to allow continuous random effect as well (code below). I understand how random effects are supposed to be categorical, but I came across discussions/forums where arguments were made for continuous random effects as well, hence SAS having that option I guess. But , I am more so trying to understand what do I make of the variance estimate and the coefficient estimate you get in such scenario.
For example we are trying to predict wealth based on education and age. And, we made age as random effect. The code would look like
Proc mixed data=data;
model wealth = education;
random age /solution;
run;
I got the covariance parameter estimate for the random effect age.notice i dont have class statement. I understand, it deosn't make sense to have age as random effect, but I am trying to understand whats going on behind in the sas in the above code. Also, is the random coefficient estimate for age which is truncated towards 0 comparable to fixed effect coefficient when age is fixed?
Thanks in advance.

SAS Proc TTest - Difference from fixed value

How do I test if a variable is significantly different from the value 15?
What value does the test need to be greater than for a confidence level of 95% that is it different (i.e 5% opportunity the average is 15?)
To specify a null value to test against look at the H0 option. For the 5%, I assume is the alpha level. If your test is two sided then the default side option is fine, otherwise set that value to 1.
Proc ttest data=sashelp.class h0=15 alpha=0.05 sides=2;
Var age;
Run;
All of these options are detailed in the documentation. The value used for comparison against the test statistic varies based on the sample size.
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_ttest_sect002.htm
UCLA offers a detailed walk through of Proc Ttest
http://www.ats.ucla.edu/stat/sas/output/ttest.htm