SAS Regression Returns 0 Coefficient - sas

I am running a SAS regression using the following model:
ods output ParameterEstimates=stock_params;
proc reg data=REG_DATA;
by SYMBOL DATE;
model RETURN_SEC = market_premium;
run;
ods output close;
Where RETURN_SEC is the return of the stock per second and market_premium is the return of SPY index minus the risk free rate (the risk free rate is quite close to zero because it is at a second level).
However, I got lots of 0s (not all of them, but a significant number of) in the coefficient of market_premium. When I check the log it says:
NOTE: Model is not full rank. Least-squares solutions for the parameters are
not unique. Some statistics will be misleading. A reported DF of 0 or B
means that the estimate is biased.
NOTE: The following parameters have been set to 0, since the variables are a
linear combination of other variables as shown.
market_premium = - 19E-12 * Intercept
This is quite weird. I checked the data and it seems fine (although lot of data contains 0 return_sec, which is normal because sometimes the return doesn't change in seconds but in minutes).
What also puzzles me is that why SAS would return 0 coefficient on market_premium when market_premium = - 19E-12 * Intercept. I mean, does SAS treat the Intercept as the only variable when it sees that market_premium is a scalar times of Intercept?

Related

proc surveryselect sample defined verus sample received

I am using the following code
proc surveyselect data = tmp method = urs sampsize = 500 seed = 100 out = out_tmp; run;
However when I look at the logs I am getting 491 records. My tmp dataset has 30,000 records. Need help to understand why the 9 records are getting dropped. I played around with changing the seed value and I am getting around 470 to 495 records per random seed but never get an absolute 500. Referred to the documentation and URS option means "unrestricted random sampling, which is selection with equal probability and with replacement". Probability being equal has no impact however, replacement terminology , I understand as, a record could be present more than once, which is what I am aiming for.
What I do not understand is why does the drawn sample stops are at number less than the 500 i have specified?
Thanks for the help.
The issue is you're failing to quite understand how URS works - I recommend a look through the documentation.
Take this (extreme) example:
proc surveyselect data=sashelp.cars method=urs out=sample_cars sampsize=10000 seed=100;
run;
NOTE: The sample size, 10000, is greater than the number of sampling units, 428.
NOTE: The data set WORK.SAMPLE_CARS has 428 observations and 16 variables.
NOTE: PROCEDURE SURVEYSELECT used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
Here I ask for 10,000 (out of 428 total records!), and get... 428 records. The important detail to pay attention to is the NumberHits variable. That says how many times each record was sampled.
If you want one record output for each hit, meaning you want those duplicates, you can add outhits to your PROC SURVEYSELECT statement. From the documentation on URS:
For unrestricted random sampling, by default, the output data set contains a single copy of each unit selected, even when a unit is selected more than once, and the variable NumberHits records the number of hits (selections) for each unit. If you specify the OUTHITS option, the output data set contains m copies of a sampling unit for which NumberHits is m; for example, the output data set contains three copies of a sampling unit that is selected three times (NumberHits is three). For information about the contents of the output data set, see the section Sample Output Data Set.
Here is my example modified to do just that.
proc surveyselect data=sashelp.cars method=urs out=sample_cars sampsize=10000 seed=100 outhits;
run;
NOTE: The sample size, 10000, is greater than the number of sampling units, 428.
NOTE: The data set WORK.SAMPLE_CARS has 10000 observations and 16 variables.
NOTE: PROCEDURE SURVEYSELECT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

SAS propensity score matching: Observations considered for matching in PSMATCH is less than the total observations available in the data set

I am using SAS procedure PSMATCH to balance the cohorts. I am calculating the propensity score separately using logistic regression and then using the generated dataset in PSMATCH using PSDATA. I am doing multiple iterations of matching (to get the best results) by bringing variation in region, method (Optimal, Greedy and variable ratio), distance variable, caliper value and ratio. Please find the code below:
proc psmatch data=work.&data_set. region=&region_var.;
class &cat_var.;
psdata treatvar = case_cntrl_fl(Treated='1') PS=prop_score;
match method=&mtch_method.(&k_method.=&k_val.) exact= &.exact_mtch_var.
stat=&stat_var. caliper(mult=stddev)=&caliper_var.;
assess lps ps var=(prop_score &covar_asses.) / plots = (boxplot cloudplot);
output out(obs=match)=WORK.psm ps=ps lps=lps matchid=_MatchID matchwgt = _MATCHWGT_;
run;
My concern is regarding the number of observation considered for matching (i.e. All Observations). The total observation logistic regression data set are Treatment Arm 1: 531 and Treatment Arm 2: 3252 However, in PSMATCH report All observations reported as Treatment Arm 1: 446 and Treatment Arm 2: 2784 The result is consistent irrespective of the variations in PSMATCH methods
Can somebody help me understand the possible reason of drop in counts?
You likely have missing values in your data. If any variable in the proc is missing, that entire row is excluded from the analysis overall.

In SAS: How to produce odds ratios (OR) in the results of PROC GENMOD

Possibly related to this question: How can I print odds ratios as part of the results of a GENMOD procedure?
I am dealing with a wide dataset containing; a main exposure variable, a categorical variable Type (four levels), as several continuous and binary variables as confounding factors.
Additional info: The dataset contains multiple imputations.
I am using the following code:
Proc genmod;
Class ID Type (ref=first)
Model class1= Type;
estimate 'black' TYPE 0 1 1/exp;
estimate 'white' TYPE 1 0 1/exp;
estimate 'red' TYPE 0 1 0/exp;
Repeated ID;
By imputation;
Run;
I expected the results table to contain, among others, the beta for the exponential of every level of the categorical variable Type ( bar that variable's reference group). The actual results table lacks beta values, nor does the table have confidence intervals printed.
What syntax should I use to tell SAS to produce those numbers in the results? I have looked through SAS documentation, but I have yet found an answer.

Outputting predicted values in SAS proc mixed: Prohibitive performance issues

I've noticed strange behavior with SAS proc mixed: Models with a modestly large number of rows, which take only seconds to converge, nevertheless take upwards of half an hour to finish running if I ask for output of predicted values & residuals. The thing that seems perverse is that when I run the analogous models in R using nlme::lme(), I get the predicted values & residuals as a side effect and the models complete in seconds. That makes me think this is not merely a memory limitation of my machine.
Here's some sample code. I can't provide the real data for which I'm seeing this issue, but the structure is 1-5 rows per subject, ~1500 unique subjects, ~5,000 outcome-covariate sets total.
In SAS:
proc mixed data=testdata noclprint covtest;
class subjid ed gender;
model outcome = c_age ed gender / ddfm=kr solution residual outp=testpred;
random int c_age / type=un sub=subjid;
run;
In R:
lme.test <- lme(outcome ~ c_age + ed + gender, data=testdata,
random = ~c_age|factor(subjid), na.action=na.omit)
Relevant stats: Win7, SAS 9.4 (64-bit), R 3.3, nlme 3.1-131.

calculate market weighted return using SAS

I have four variables Name, Date, MarketCap and Return. Name is the company name. Date is the time stamp. MarketCap shows the size of the company. Return is its return at day Date.
I want to create an additional variable MarketReturn which is the value weighted return of the market at each point time. For each day t, MarketCap weighted return = sum [ return(i)* (MarketCap(i)/Total(MarketCap) ] (return(i) is company i's return at day t).
The way I do this is very inefficient. I guess there must be some function can easily achieve this traget in SAS, So I want to ask if anyone can improve my code please.
step1: sort data by date
step2: calculate total market value at each day TotalMV = sum(MarketCap).
step3: calculate the weight for each company (weight = MarketCap/TotalMV)
step4: create a new variable 'Contribution' = Return * weight for each company
step5: sum up Contribution at each day. Sum(Contribution)
Weighted averages are supported in a number of SAS PROCs. One of the more common, all-around useful ones is PROC SUMMARY:
PROC SUMMARY NWAY DATA = my_data_set ;
CLASS Date ;
VAR Return / WEIGHT = MarketCap ;
OUTPUT
OUT = my_result_set
MEAN (Return) = MarketReturn
;
RUN;
The NWAY piece tells the PROC that the observations should be grouped only by what is stated in the CLASS statement - it shouldn't also provide an ungrouped grand total, etc.
The CLASS Date piece tells the PROC to group the observations by date. You do not need to pre-sort the data when you use CLASS. You do have to pre-sort if you say BY Date instead. The only rationale for using BY is if your dataset is very large and naturally ordered, you can gain some performance. Stick to CLASS in most cases.
VAR Return / WEIGHT = MarketCap tells the proc that any weighted calculations on Return should use MarketCap as the weight.
Lastly, the OUTPUT statement specifies the data set to write the results to (using the OUT option), and specifies the calculation of a mean on Return that will be written as MarketReturn.
There are many, many more things you can do with PROC SUMMARY. The documentation for PROC SUMMARY is sparse, but only because it is the nearly identical sibling of PROC MEANS, and SAS did not want to produce reams of mostly identical documentation for both. Here is the link to the SAS 9.4 PROC MEANS documentation. The main difference between the two PROCS is that SUMMARY only outputs to a dataset, while MEANS by default outputs to the screen. Try PROC MEANS if you want to see the result pop up on the screen right away.
The MEAN keyword in the OUTPUT statement comes from SAS's list of statistical keywords, a helpful reference for which is here.