Two sided test: proc ttest? - sas

I need to calculate the two-sided ttest in SAS.
I generally use the proc ttest adding side=2 but I am not sure if this test works fine or if another way should be preferred to it.
An example of data is the following:
Score Segment Obs Class_obs
1 0 500 15
1 1 500 34
2 0 234 23
2 1 766 65
Where the p-value is calculated per each score. Segment means that a condition is met (e.g. score higher than 60. 0 means ‘lower than 60’ while 1 means ‘higher than 60’).
Obs is the number of observations in each segment by score. Class obs is the number of obs that satisfy a specific condition on the overall population.
Happy to share more info if it needs.

Related

Rolling Window Model for Unbalanced Dataset in SAS

I have an unbalanced panel dataset of the following form (simplified):
data have;
input ID YEAR EARN LAG_EARN;
datalines;
1 1960 450 .
1 1961 310 450
1 1962 529 310
2 1978 10 .
2 1979 15 10
2 1980 8 15
2 1981 10 8
2 1982 15 10
2 1983 8 15
2 1984 10 8
3 1972 1000 .
3 1973 1599 1000
3 1974 1599 1599
;
run;​
I now want to estimate the following model for each ID:
proc reg;
by ID;
EARN = LAG_EARN;
run;
However, I want to do this for rolling windows of some size. Say for example for windows of size 2. The window should only contain non-empty observations. For example, in the case of firm A, the window is applicable from 1961 onwards and thus only one time (since only one year follows after 1961 and the window is supposed to be of size 2).
Finally, I want to get a table with year columns and firm rows. The table should indicate the following: The regression model (with window size 2) has been performed one time for firm A. The quantity of available years, has only allowed one estimation of this model. Put differently, in 1962 the coefficient of the regression model has a value of X based on the 2 year prior window. Applying the same logic to the other two firms, one can get the following table. "X" representing the respective estimated coefficient value in certain year for firm A/B/C based on the 2-year window and "n" indicating the non-existence of such a value:
data want;
input ID 1962 1974 1980 1981 1982 1983 1984;
datalines;
1 X n n n n n n
2 n n X X X X X
3 n X n n n n n
;
run;​
I do not know how to execute this. Furthermore, I would like to create a macro that allows me to estimate different rolling window models while still creating analogous output dataframes. I would appreciate any help with it, since I have been struggling quite some time now.
Try this macro. This will only output if there are non-missing values of lags that you specify.
%macro lag(data=, out=, window=);
data _want_;
set &data.;
by ID;
LAG_EARN = lag&window.(earn);
if(first.ID) then call missing(lag_earn);
if(NOT missing(lag_earn));
run;
proc sort data=_want_;
by year id;
run;
proc transpose data=_want_
out=&out.(drop=_NAME_);
by ID notsorted;
id year;
var lag_earn;
run;
proc sort data=&out.;
by id;
run;
%mend;
%lag(data=have, out=want, window=1);

Assigning a value to a certain number of rows within a "by" group - SAS

I've spent quite a lot of time on Stack Overflow looking for answers to other questions, but I'm really stuck on this one, so I'm finally asking a question!
I have a dataset of fish in SAS, with:
a unique ID for each angler
three different variables with number of fish released in each category by that angler: over legal size, under legal size, and released dead
a sequential number (fishno) based on the number of rows for each ID; 1 to the last row of that ID.
Variable to be created: Disposition--could be either character variable with "legal" "under" "dead" options or even numeric values of 1-3.
It was originally set up with one row per unique ID, but I set it so that now there is one row per fish discarded (i.e. if there were 3 legal size and 2 undersize fish, I now have 5 rows).
I need to assign, by unique ID, whether each row/fish was released legal, undersize or dead. In the previous example, for a unique ID, I'd need 3 rows assigned to a Disposition of "legal" and 2 rows assigned to a Disposition of "under".
I've tried first.var statements along with if-then-do statements; played around with macros; nothing worked quite right and I'm pretty stuck here. Is there some sort of random assignment I should try? Is there a much easier way that I'm missing?
Example of the data below...
THANK YOU!!
Data in Excel format
Assuming you already have the FISHNO variable, there needs to be some method for assigning each fish as legal, dead, or undersize. The following code will assign the disposition in the that order:
data have;
input ID LEGAL DEAD UNDERSIZE FISHNO;
datalines;
15 1 0 1 1
15 1 0 1 2
29 2 0 2 1
29 2 0 2 2
29 2 0 2 3
29 2 0 2 4
38 1 0 1 1
38 1 0 1 2
53 1 0 1 1
53 1 0 1 2
55 1 0 1 1
55 1 0 1 2
;
run;
data want;
set have;
if legal>0 and legal>=fishno then disposition = 'legal';
else if dead>0 and legal+dead>=fishno then disposition = 'dead';
else if undersize>0 and legal+dead+undersize>=fishno then disposition = 'under';
run;

SAS - group individual observations

Sorry I'm new to a lot of the features of SAS - I've only been using for a couple months, mostly for survey data analysis but now I'm working with a dataset which has individual level data for a cross-over study. It's in the form: ID treatment period measure1 measure2 ....
What I want to do is be able to group these individuals by their treatment group and then output a variable with a group average for measure 1 and measure 2 and another variable with the count of observations in each group.
ie
ID trt per m1 m2
1 1 1 101 75
1 2 2 135 89
2 1 1 103 77
2 2 2 140 87
3 2 1 134 79
3 1 2 140 80
4 2 1 156 98
4 1 2 104 78
what I want is the data in the form:
group a = where trt=1 & per=1
group b = where trt=2 & per=2
group c = where trt=2 & per=1
group d = where trt=1 & per=2
trtgrp avg_m1 avg_m2 n
A 102 76 2
B ... ... ...
C
D
Thank you for the help.
/Creating Sample dataset/
data test;
infile datalines dlm=" ";
input ID : 8.
trt : 8.
per : 8.
m1 : 8.
m2 : 8.;
put ID=;
datalines;
1 1 1 101 75
1 2 2 135 89
2 1 1 103 77
2 2 2 140 87
3 2 1 134 79
3 1 2 140 80
4 2 1 156 98
4 1 2 104 78
;
run;
/Using proc summary to summarize trt and per/
Variables(dimensions) on which you want to summarize would go into class
Variables(measures) for which you want to have average would go into var
Since you want to have produce average so you will have to write mean as the desired statistics.
Read more about proc summary here
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473735.htm
and here
http://web.utk.edu/sas/OnlineTutor/1.2/en/60476/m41/m41_19.htm
proc summary data=test nway;
class trt per;
var m1 m2;
output out=final(drop= _type_)
mean=;
run;
The alternative method uses PROC SQL, the advantage being that it makes use of plain-English syntax, so the concept of a group in your question is maintained in the syntax:
PROC SQL;
CREATE TABLE final AS
SELECT
trt,
per,
avg(m1) AS avg_m1,
avg(m2) AS avg_m2,
count(*) AS n
FROM
test
GROUP BY trt, per;
QUIT;
You can even add your own group headings by applying conditional CASE logic as you did in your question:
PROC SQL;
CREATE TABLE final AS
SELECT
CASE
WHEN trt=1 AND per=1 THEN 'A'
WHEN trt=2 AND per=2 THEN 'B'
WHEN trt=2 AND per=1 THEN 'C'
WHEN trt=1 AND per=2 THEN 'D'
END AS group
avg(m1) AS avg_m1,
avg(m2) AS avg_m2,
count(*) AS n
FROM
test
GROUP BY group;
QUIT;
COUNT(*) simply counts the number of rows found within the group. The AVG function calculates the average for the given column.
In each example, you can replace the explicitly named columns in the GROUP BY clause with a number representing column position in the SELECT clause.
GROUP BY 1,2
However, take care with this method, as adding columns to the SELECT clause later can cause problems.

How to check if price is going up in 10 consecutive trades in SAS?

I have intra-day price data for stock trades and need to write a code to determine the instances in which the following condition is met: Price should go up at least for 10 consecutive trades.
Here is a sample of my data (time is number of minutes in the day, if it's 1 am my time will be 60, if it's 2 am, my time will be 120 etc.):
Obs Time Symbol Price
1 288 AA 36.2800
2 304 AA 36.2800
3 305 AA 36.3400
4 307 AA 36.2800
5 311 AA 36.1500
6 337 AA 36.2000
How can I write this code? Probably a loop is necessary but I can not figure it out. Thank you.
Assuming no missing values, something like:
data want ;
set have ;
lagPrice=lag(Price) ;
if Price>lagPrice and not missing(lagPrice) then Increasing ++ 1 ;
else Increasing=0 ;
if Increasing > 10 then Trend=1 ;
run ;
That will flag the 10th record of an increasing trend, and all those after. Is that what you want? Or are you looking for a ways to flag all records involved in the trend? Or something else??

Confidence intervals on a matrix of data in SAS

I have the following matrix of data, which I am reading into SAS:
1 5 12 19 13
6 3 1 3 14
2 7 12 19 21
22 24 21 29 18
17 15 22 9 18
It represents 5 different species of animal (the rows) in 5 different areas of an environment (the columns). I want to get a Shannon diversity index for the whole environment, so I sum the rows to get:
48 54 68 79 84
Then calculate the Shannon index from this, to get:
1.5873488
What I need to do, however, is calculate a confidence interval for this Shannon index. So I want to perform a nonparametric bootstrap on the initial matrix.
Can anyone advise how this is possible in SAS?
There are several ways to do this in SAS. I would use proc surveyselect to generate the bootstrap samples, and then calculate the Shannon Index for each replicate. (I didn't know what the Shannon Index was, so my code is just based on what I read on Wikipedia.)
data animals;
input v1-v5;
cards;
1 5 12 19 13
6 3 1 3 14
2 7 12 19 21
22 24 21 29 18
17 15 22 9 18
run;
/* Generate 5000 bootstrap samples, with replacement */
proc surveyselect data=animals method=urs n=5 reps=5000 seed=10024 out=boots;
run;
/* For each replicate, calculate the sum of each variable */
proc means data=boots noprint nway;
class replicate;
var v:;
output out=sums sum=;
run;
/* Calculate the proportions, and p*log(p), which will be used next */
data sums;
set sums;
ttl=sum(of v1-v5);
array ps{*} p1-p5;
array vs{*} v1-v5;
array hs{*} h1-h5;
do i=1 to dim(vs);
ps{i}=vs{i}/ttl;
hs{i}=ps{i}*log(ps{i});
end;
keep replicate h:;
run;
/* Calculate the Shannon Index, again for each replicate */
data shannon;
set sums;
shannon = -sum(of h:);
keep replicate shannon;
run;
We now have a data set, shannon, which contains the Shannon Index calculated for each of 5000 bootstrap samples. You could use this to calculate p-values, but if you just want critical values, you can run proc means (or univariate if you want a 5% value, as I don't think it's possible to get 97.5 quantiles with proc means).
proc means data=shannon mean p1 p5 p95 p99;
var shannon;
run;