Get values of survival function in SAS - sas

I generated a random sample from an exponential distribution and sorted them so they are going from lowest to highest value, giving me my order statistics. Now I need to get the values of the survival function at these numbers and plot them against the order statistics. I cannot seem to figure out how to get the list of these survival values in SAS, so I can plot them.

Survivor function estimates at the observed event times can be obtained from the LIFETEST procedure in SAS.
proc lifetest data = yourdata outsurv = survest;
time time2event*eventindicator(0);
run;
The output dataset created by the OUTSURV= option contains the survivor function estimates. Note that by default this gives the Kaplan-Meier estimate of the survivor function.
A plot of the estimated survivor function across time is produced automatically by LIFETEST. If you want to plot against order statistics, use the STEP statement in the SGPLOT procedure after computing your order statistics.
proc sgplot data = survandorder;
step x = order y = survival;
run;

Related

I need to find the confidence intervals for proportions using stratified data

I'm trying to report estimates of proportions of subjects of a stratified random sample
I've tried every website I can find for SAS proc surveymeans, and I don't understand what I'm doing wrong.
data b;
set Data;
keep id texting section;
run;
proc surveyselect data=b out=samp_b method=srs n=(15,12,10,8)
seed=123;
strata section;
run;
proc surveymeans data=samp_b;
strata section;
weight SamplingWeight;
var texting;
run;
I should get confidence intervals for the strata, but they are not showing up. Also I need confidence intervals for the proportions!
I don't know what version of SAS/STAT you are using, but per SAS/STAT 9.2 Proc Surveymeans documentation pages, you can do one or both of the following:
1) Add the relevant statistics keywords to the proc surveymeans statement
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_surveymeans_sect007.htm
In the PROC SURVEYMEANS statement, you also can use statistic-keywords to specify statistics for the procedure to compute. Available statistics include the population mean and population total, together with their variance estimates and confidence limits. You can also request data set summary information and sample design information.
The available statistics keywords are listed and described on these pages:
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_surveymeans_a0000000238.htm
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_surveymeans_sect007.htm#statug.surveymeans.smeanskeys
So, to print the 95% two-sided confidence interval for the mean, you would add CLM to the end of your Proc Surveymeans statement.
2) Save the Statistics table with confidence intervals to a separate SAS dataset with an additional ods output Statistics=MyStat; statement, per these instructions.

plot entire time series in SAS?

I am trying to make a forecast, and I want to see the entire time series, with the forcasted period at the end (need to compare with another graph of this kind).
SAS 9.4 does not want to comply, however, and only shows me the forecasting part.
What can I do to remedy this?
The code I'm using is:
Proc arima data=logtabell;
identify var=y(12) nlag=24;
estimate p=1 q=2;
forecast lead=12 interval=month id=date out=results;
run;
Your out= dataset will contain all values by default. If you want to specifically see the full graph that it outputs, add the plots option:
proc arima data=logtabell plots=forecast(forecast);
Or, just do it the easy way by getting every plot:
proc arima data=logtabell plots = all;
Also, make sure ods graphics on; is set.
SAS procedures have an out=resultSet option, from which you can get the results in a dataset.
Combine this output with your time serious in one graph created with proc sgplot

calculate market weighted return using SAS

I have four variables Name, Date, MarketCap and Return. Name is the company name. Date is the time stamp. MarketCap shows the size of the company. Return is its return at day Date.
I want to create an additional variable MarketReturn which is the value weighted return of the market at each point time. For each day t, MarketCap weighted return = sum [ return(i)* (MarketCap(i)/Total(MarketCap) ] (return(i) is company i's return at day t).
The way I do this is very inefficient. I guess there must be some function can easily achieve this traget in SAS, So I want to ask if anyone can improve my code please.
step1: sort data by date
step2: calculate total market value at each day TotalMV = sum(MarketCap).
step3: calculate the weight for each company (weight = MarketCap/TotalMV)
step4: create a new variable 'Contribution' = Return * weight for each company
step5: sum up Contribution at each day. Sum(Contribution)
Weighted averages are supported in a number of SAS PROCs. One of the more common, all-around useful ones is PROC SUMMARY:
PROC SUMMARY NWAY DATA = my_data_set ;
CLASS Date ;
VAR Return / WEIGHT = MarketCap ;
OUTPUT
OUT = my_result_set
MEAN (Return) = MarketReturn
;
RUN;
The NWAY piece tells the PROC that the observations should be grouped only by what is stated in the CLASS statement - it shouldn't also provide an ungrouped grand total, etc.
The CLASS Date piece tells the PROC to group the observations by date. You do not need to pre-sort the data when you use CLASS. You do have to pre-sort if you say BY Date instead. The only rationale for using BY is if your dataset is very large and naturally ordered, you can gain some performance. Stick to CLASS in most cases.
VAR Return / WEIGHT = MarketCap tells the proc that any weighted calculations on Return should use MarketCap as the weight.
Lastly, the OUTPUT statement specifies the data set to write the results to (using the OUT option), and specifies the calculation of a mean on Return that will be written as MarketReturn.
There are many, many more things you can do with PROC SUMMARY. The documentation for PROC SUMMARY is sparse, but only because it is the nearly identical sibling of PROC MEANS, and SAS did not want to produce reams of mostly identical documentation for both. Here is the link to the SAS 9.4 PROC MEANS documentation. The main difference between the two PROCS is that SUMMARY only outputs to a dataset, while MEANS by default outputs to the screen. Try PROC MEANS if you want to see the result pop up on the screen right away.
The MEAN keyword in the OUTPUT statement comes from SAS's list of statistical keywords, a helpful reference for which is here.

Renaming Coefficients that Result from Proc Logistic/Problems Surrounding Variable Names Common to Multiple Datasets

I am estimating a model for firm bankruptcy that involves 11 factors. I have data from 1900 to 2000 and my goal is to estimate my model using proc logistic for the period 1900-1950 and then test its performance on the 1951 through 2000 data. Proc logistic runs fine but the problem I have is that the estimated coefficients have the same name as my factors that I was using in my model. Suppose the dataset that contains all my observations is called myData and the dataset that contains the estimated coefficients which I obtain using an outtest statement (in proc logistic) is called factorEstimates. Now both of these data sets have the variables factor1, factor2, ..., factorN. Now I want to form the dataset outOfSampleResults that does something like the following:
data outOfSampleResults;
set myData factorEstimates;
newVar=factor1*factor1;
run;
Where the first mention of factor1 refers to that contained in myData and the second refers to that contained in factorEstimates. How can I inform sas which dataset it should read for this variable that is common to both of the datasets in the set statement? Alternatively, how could I quickly rename factor1, factor2, ..., factorN as factor1Estimate, factor2Estimate, ..., factorNEstimate in the factorEstimates dataset so as to circumvent this common variable name issue altogether?
Two quick ways to get estimates for a model already developed:
1. Proc logistic score statement
http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect066.htm
Include the data in your original proc logistic but use a new variable and ensure that the dependent variable is missing for the observations you want to predict.
data stacked;
set all;
if year >1950 then predicted=.;
else predicted=y;
run;
proc logistic data=stacked;
model predicted = factor1 - factor12;
output out=out_predicted predicted=p;
run;

how to find outliers in sas with proc means?

is there a way to detect an outlier from proc means while calculating min max Q1 and Q3?
the box plot procedure is not working on my SAS and I am trying to perform a boxplt in excel with the values from SAS.
Assuming you have a specific definition for what an outlier is, PROC UNIVARIATE can calculate the value that appears at that percentile using the PCTLPTS keyword on the OUTPUT statement. It also will identify extreme observations individually, so you can see the top few observations (if you have few enough observations that the number of extremes is likely to be <= 5).
The paper A SAS Application to Identify and Evaluate Outliers goes over a few of the ways you can look at outliers, including box plots and PROC UNIVARIATE, and includes some regression-based approaches as well.
If you want a 'standard boxplot' use the outbox= option in SAS to create the standard data set used for a box plot.
proc boxplot data=sashelp.class;
plot age*sex / outbox = xyz;
run;