SAS Adaptivereg Limiting Breakpoints - sas

I am using SAS's adaptivereg procedure to create a piece-wise linear function with temperature (temp) as my x variable and Usage (usage_value) as my y variable. I can use the details of adaptivereg procedure to find the ranges of the different linear functions of the piece-wise function. Is there a way I can limit the number of ranges, (i.e. instead of having 8 linear functions in the piece-wise function, I want to limit it to 5 linear functions). Are there any options I can add that would let me limit the amount of linear functions?
Below is the code I am using. Where a1 is the name of my data set, temp is the independent variable, and usage_value is the dependent variable.
proc adaptivereg data=a1 plots=all details=bases;
model usage_value = temp;
run;

I love adaptivereg. It's such a cool little procedure.
You can use the df option in the model statement to control the total number of knots to consider, and the maxbasis option to control the maximum number of knots in the final model. The higher the degrees of freedom that you use, the fewer the knots.
proc adaptivereg data=sashelp.air;
model air = date / df=12 maxbasis=3;
run;
You can also use the alpha= option to fine-tune it. Increasing alpha will result in more knots.
An alternative approach can be using the pbspline/spline options in transreg or proc quantreg, respectively.
proc transreg data=sashelp.air;
model identity(air) = pbspline(date / evenly=6);
run;
proc quantreg data=sashelp.air;
effect sp = spline(date / knotmethod=equal(12) );
model air = sp / quantile=0.5;
run;

Related

SAS Regression Returns 0 Coefficient

I am running a SAS regression using the following model:
ods output ParameterEstimates=stock_params;
proc reg data=REG_DATA;
by SYMBOL DATE;
model RETURN_SEC = market_premium;
run;
ods output close;
Where RETURN_SEC is the return of the stock per second and market_premium is the return of SPY index minus the risk free rate (the risk free rate is quite close to zero because it is at a second level).
However, I got lots of 0s (not all of them, but a significant number of) in the coefficient of market_premium. When I check the log it says:
NOTE: Model is not full rank. Least-squares solutions for the parameters are
not unique. Some statistics will be misleading. A reported DF of 0 or B
means that the estimate is biased.
NOTE: The following parameters have been set to 0, since the variables are a
linear combination of other variables as shown.
market_premium = - 19E-12 * Intercept
This is quite weird. I checked the data and it seems fine (although lot of data contains 0 return_sec, which is normal because sometimes the return doesn't change in seconds but in minutes).
What also puzzles me is that why SAS would return 0 coefficient on market_premium when market_premium = - 19E-12 * Intercept. I mean, does SAS treat the Intercept as the only variable when it sees that market_premium is a scalar times of Intercept?

calculate market weighted return using SAS

I have four variables Name, Date, MarketCap and Return. Name is the company name. Date is the time stamp. MarketCap shows the size of the company. Return is its return at day Date.
I want to create an additional variable MarketReturn which is the value weighted return of the market at each point time. For each day t, MarketCap weighted return = sum [ return(i)* (MarketCap(i)/Total(MarketCap) ] (return(i) is company i's return at day t).
The way I do this is very inefficient. I guess there must be some function can easily achieve this traget in SAS, So I want to ask if anyone can improve my code please.
step1: sort data by date
step2: calculate total market value at each day TotalMV = sum(MarketCap).
step3: calculate the weight for each company (weight = MarketCap/TotalMV)
step4: create a new variable 'Contribution' = Return * weight for each company
step5: sum up Contribution at each day. Sum(Contribution)
Weighted averages are supported in a number of SAS PROCs. One of the more common, all-around useful ones is PROC SUMMARY:
PROC SUMMARY NWAY DATA = my_data_set ;
CLASS Date ;
VAR Return / WEIGHT = MarketCap ;
OUTPUT
OUT = my_result_set
MEAN (Return) = MarketReturn
;
RUN;
The NWAY piece tells the PROC that the observations should be grouped only by what is stated in the CLASS statement - it shouldn't also provide an ungrouped grand total, etc.
The CLASS Date piece tells the PROC to group the observations by date. You do not need to pre-sort the data when you use CLASS. You do have to pre-sort if you say BY Date instead. The only rationale for using BY is if your dataset is very large and naturally ordered, you can gain some performance. Stick to CLASS in most cases.
VAR Return / WEIGHT = MarketCap tells the proc that any weighted calculations on Return should use MarketCap as the weight.
Lastly, the OUTPUT statement specifies the data set to write the results to (using the OUT option), and specifies the calculation of a mean on Return that will be written as MarketReturn.
There are many, many more things you can do with PROC SUMMARY. The documentation for PROC SUMMARY is sparse, but only because it is the nearly identical sibling of PROC MEANS, and SAS did not want to produce reams of mostly identical documentation for both. Here is the link to the SAS 9.4 PROC MEANS documentation. The main difference between the two PROCS is that SUMMARY only outputs to a dataset, while MEANS by default outputs to the screen. Try PROC MEANS if you want to see the result pop up on the screen right away.
The MEAN keyword in the OUTPUT statement comes from SAS's list of statistical keywords, a helpful reference for which is here.

Difference between Proc univarite and Proc severity for fitting continuous (positive support) distribution

My goal is to fit a data to any distribution which has positive support. (weibull(2p), gamma(2p), pareto(2p), lognormal (2p),exponential(1P)). First attempt,i used proc univariate.This is my code
proc univariate data=fit plot outtable=table;
var week1;
histogram / exp gamma lognormal weibull pareto;
inset n mean(5.3) std='Standar Deviasi'(5.3)
/ pos = ne header = 'Summary Statistics';
axis1 label=(a=90 r=0);
run;
The first thing i noticed, there's no kolmogorov statistic shown for weibull distribution.Then i used proc severity instead.
proc severity data=fit print=all plots(histogram kernel)=all;
loss week1;
dist exp pareto gamma logn weibull;
run;
Now, i got the KS statistic for weibull distribution.
Then i compared KS statistic produced by proc severity and proc univariate. They're different. Why? Which one should i use?
I do not have access to SAS/ETS so cannot confirm this with proc severity, but I imagine that the difference you are seeing come down to the way the distribution parameters are fitted.
With your proc univriate code you are not requesting estimation for several of the parameters (some are in some cases set to 1 or 0 by default, see sigma and theta in the user guide). For example:
data have;
do i = 1 to 1000;
x = rand("weibull", 5, 5);
output;
end;
run;
ods graphics on;
proc univariate data = have;
var x;
/* Request maximum liklihood estimate of scale and threshold parameters */
histogram / weibull(theta = EST sigma = EST);
/* Request maximum liklihood estimate of scale parameter and 0 as threshold */
histogram / weibull;
run;
You will note that when an estimate of theta is requested SAS also produces the KS statistic, this is due to the way that SAS estimates the fit statistic requiring know distribution parameters (full explanation here).
My guess is that you are seeing different fit statistics between the two procedures because either they are returning slightly different fits, or they use different calculations for the estimation of fit statistics. If you are interested you can investigate how they perform their parameter estimation in the user guide (proc severity and proc univariate). If you wanted to investigate further you could force the distribution parameters to match in both procedures and then compare the fit statistics to see how far they differ.
I would recommend that if possible you use only one of the procedure, and that you select the one that best fits your needs in terms of output.

How to do simple mathematics with min/max variable in SAS

I am currently running a macro code in SAS and I want to do a calculation with regards to max and min. Right now the line of code I have is :
hhincscaled = 100*(hhinc - min(hhinc) )/ (max(hhinc) - min(hhinc));
hhvaluescaled = 100*(hhvalue - min(hhvalue))/ (max(hhvalue) - min(hhvalue));
What I am trying to do is re-scale household income and value variables with the calculations below. I am trying to subtract the minimum value of each variable and subtract it from the respective maximum value and then scale it by multiplying it by 100. I'm not sure if this is the right way or if SAS is recognizing the code the way I want it.
I assume you are in a Data Step. A Data Step has an implicit loop over the records in the data set. You only have access to the record of the current loop (with some exceptions).
The "SAS" way to do this is the calculate the Min and Max values and then add them to your data set.
Proc sql noprint;
create table want as
select *,
min(hhinc) as min_hhinc,
max(hhinc) as max_hhinc,
min(hhvalue) as min_hhvalue,
max(hhvalue) as max_hhvalue
from have;
quit;
data want;
set want;
hhincscaled = 100*(hhinc - min_hhinc )/ (max_hhinc - min_hhinc);
hhvaluescaled = 100*(hhvalue - min_hhvalue)/ (max_hhvalue - min_hhvalue);
/*Delete this if you want to keep the min max*/
drop min_: max_:;
run;
Another SAS way of doing this is to create the max/min table with PROC MEANS (or PROC SUMMARY or your choice of alternatives) and merge it on. Doesn't require SQL knowledge to do, and probably about the same speed.
proc means data=have;
*use a class value if you have one;
var hhinc hhvalue;
output out=minmax min= max= /autoname;
run;
data want;
if _n_=1 then set minmax; *get the min/max values- they will be retained automatically and available on every row;
set have;
*do your calculations, using the new variables hhinc_max hhinc_min etc.;
run;
If you have a class statement - ie, a grouping like 'by state' or similar - add that in proc means and then do a merge instead of a second set in want, by your class variable. It would require a sorted (initial) dataset to merge.
You also have the option of doing this in SAS-IML, which works more similarly to how you are thinking above. IML is the SAS interactive matrix language, and more similar to r or matlab than the SAS base language.

"Automatically" calculate linear combination of parameter estimates with PROC GLM

Background: I have a categorical variable, X, with four levels that I fit as separate dummy variables. Thus, there are three total dummy variables representing x=1, x=2, x=3 (x=0 is baseline).
Problem/issue: I want to be able to calculate the value of a linear combination (i.e. using SAS as a calculator) of these dummy variables. For example, 2*B1 + 2*B2 + B3.
In Stata, this can be done using the lincom command, which uses the stored beta estimates to calculate linear combinations of the parameters.
In SAS in a procedure such as PROC GLM, I think I should use the ESTIMATE statement, but I'm not sure how I would specify the "weights" for each variable in this case.
You are looking for PROC SCORE. This takes output regression or factor estimates and scores a new data set. See here for an example. http://support.sas.com/documentation/cdl/en/statug/66859/HTML/default/viewer.htm#statug_score_examples02.htm
FYI, PROC MODEL does allow this in the model statement, which may be less work than PROC SCORE. I know PROC MODEL can be used readily in place of PROC REG, but I'm not sure how advanced of modeling PROC MODEL does, so it may not be an option for more complex models. I was hoping for something with less coding, but given the nature of SAS, I think this and PROC SCORE are the best I'm going to get.
What if you add your linear combination as a variable in your input dataset?
data myDatasetWithLinCom;
set mydata;
LinComb=2*(x=1)+ 2*(x=2)+(x=3); /*equvilent to 2*B1 + 2*B2 + B3*/
run;
then you can specify LinComb as one of the explanatory variables and you can lookup the coefficient directly from the output.