I am working with the simplest program of finding the average of all the scores eliminating the lowest two.
data new;
low1=smallest(1,score_1-score_6);
low2=smallest(2,score_1-score_6);
tot=score_1+score_2+score_3+score_4+score_5+score_6;
avg=(tot-low1-low2)/4;
set mydata;
run;
'mydata' does not contain any missing values yet in the output table the data is shifted by one cell down.
the output looks like dis
id low1 score_1 score_2 score_3 score_4 score_5 score_6 low2 tot avg
1 . 0 0 10 80 0 75 . . .
2 0 0 0 0 75 80 0 0 165 41.25
3 0 0 50 10 60 55 0 0 155 38.25
4...and so on
sas generates a note like this:
NOTE: Missing values were generated as a result of performing an operation on missing values
i cant figure why are the calculated values getting printed in line 2 instead starting in line 1?
help would be appreciated .thank you!
Put the SET statement right after the DATA statement (which is, I think, generally a good idea). As you have it, the first row is trying to create a variable but doesn't have any data; the data is only being read at the end.
Related
I need to calculate the two-sided ttest in SAS.
I generally use the proc ttest adding side=2 but I am not sure if this test works fine or if another way should be preferred to it.
An example of data is the following:
Score Segment Obs Class_obs
1 0 500 15
1 1 500 34
2 0 234 23
2 1 766 65
Where the p-value is calculated per each score. Segment means that a condition is met (e.g. score higher than 60. 0 means ‘lower than 60’ while 1 means ‘higher than 60’).
Obs is the number of observations in each segment by score. Class obs is the number of obs that satisfy a specific condition on the overall population.
Happy to share more info if it needs.
I've spent quite a lot of time on Stack Overflow looking for answers to other questions, but I'm really stuck on this one, so I'm finally asking a question!
I have a dataset of fish in SAS, with:
a unique ID for each angler
three different variables with number of fish released in each category by that angler: over legal size, under legal size, and released dead
a sequential number (fishno) based on the number of rows for each ID; 1 to the last row of that ID.
Variable to be created: Disposition--could be either character variable with "legal" "under" "dead" options or even numeric values of 1-3.
It was originally set up with one row per unique ID, but I set it so that now there is one row per fish discarded (i.e. if there were 3 legal size and 2 undersize fish, I now have 5 rows).
I need to assign, by unique ID, whether each row/fish was released legal, undersize or dead. In the previous example, for a unique ID, I'd need 3 rows assigned to a Disposition of "legal" and 2 rows assigned to a Disposition of "under".
I've tried first.var statements along with if-then-do statements; played around with macros; nothing worked quite right and I'm pretty stuck here. Is there some sort of random assignment I should try? Is there a much easier way that I'm missing?
Example of the data below...
THANK YOU!!
Data in Excel format
Assuming you already have the FISHNO variable, there needs to be some method for assigning each fish as legal, dead, or undersize. The following code will assign the disposition in the that order:
data have;
input ID LEGAL DEAD UNDERSIZE FISHNO;
datalines;
15 1 0 1 1
15 1 0 1 2
29 2 0 2 1
29 2 0 2 2
29 2 0 2 3
29 2 0 2 4
38 1 0 1 1
38 1 0 1 2
53 1 0 1 1
53 1 0 1 2
55 1 0 1 1
55 1 0 1 2
;
run;
data want;
set have;
if legal>0 and legal>=fishno then disposition = 'legal';
else if dead>0 and legal+dead>=fishno then disposition = 'dead';
else if undersize>0 and legal+dead+undersize>=fishno then disposition = 'under';
run;
I am using PROC GLIMMIX, and I'm curious as to why my parameter estimates are behaving strangely.
proc glimmix data=blah pconv=1e-3;
class strata1;
model event(event=LAST)=time1--time20/
noint solution link=logit dist=binary;
nloptions tech=nrridg;
covtest 'var(strata1)=0'/WALD;
random intercept/subject=strata1;
run;
Since I'm using a logistic discrete time hazard model (without any censored observations), I have my dataset constructed using the 'person-period' dataset. Here is an example of what a person-period dataset looks like:
id time1 time2 time3 time4 event
100 1 0 0 0 0
100 0 1 0 0 0
100 0 0 1 0 1
101 1 0 0 0 1
102 1 0 0 0 0
102 0 1 0 0 0
102 0 0 1 0 0
102 0 0 0 1 0
Essentially, each 'time' variable represents whether this period is occuring. So, time1=1 during the first period, 0 otherwise. And then time2=1 during the first period, 0 otherwise, and so on. I am modelling the probability that the event occurs during each of these periods. When I use PROC LOGISITIC, I get sensible parameter estimates.
proc logistic data=blah;
model event (event=LAST)=time1--time20 /noint;
run;
This code delivers parameter estimates for time1=-3.0052, which gives me a probability of the event occuring in time period 1 of .047. These estimates slowly get smaller, for each time[i] variable, which is what I would expect. However, when I run my GLIMMIX code and add in this random effect for strata1, it blows up my model - the parameter estimates for time flip their sign. time1=2.84, time2=2.67, time3=2.41, and they consistently get smaller. I'm really confused as to why- this model is telling me that the probability of the event occuring is over 90% in this period, which I know to be untrue. Does anyone have any idea why this is? I would expect these estimates to essentially have their negative sign be flipped.
Thanks.
I have intra-day price data for stock trades and need to write a code to determine the instances in which the following condition is met: Price should go up at least for 10 consecutive trades.
Here is a sample of my data (time is number of minutes in the day, if it's 1 am my time will be 60, if it's 2 am, my time will be 120 etc.):
Obs Time Symbol Price
1 288 AA 36.2800
2 304 AA 36.2800
3 305 AA 36.3400
4 307 AA 36.2800
5 311 AA 36.1500
6 337 AA 36.2000
How can I write this code? Probably a loop is necessary but I can not figure it out. Thank you.
Assuming no missing values, something like:
data want ;
set have ;
lagPrice=lag(Price) ;
if Price>lagPrice and not missing(lagPrice) then Increasing ++ 1 ;
else Increasing=0 ;
if Increasing > 10 then Trend=1 ;
run ;
That will flag the 10th record of an increasing trend, and all those after. Is that what you want? Or are you looking for a ways to flag all records involved in the trend? Or something else??
I have a dataset that is unique by 5 variables, with two dependent variables. My goal is for this dataset to have appended to it additional rows with TOTAL as the value of independent variables, with the values of the dependent variables changing accordingly.
To do this for a single independent variable is not a problem, I would do something along the lines of:
proc sql;
create table want as
select "TOTAL" as independent_var1,
independent_var2,
...
independent_var5,
sum(dependent_1) as dependent_1,
sum(dependent_2) as dependent_2
from have
group by independent_var1,...,independent_var5;
quit;
Followed by appending the original dataset in whatever fashion you choose. However, I want the above, yet x5 (for each independent variable), and then again for each possible combination of TOTAL/nontotal across the 5 independent variables. Not sure just how many datasets that is off the top of my head...but it's a decent amount.
So best strategy I've come up with so far is to use the above with some mildly creative macro code to generate all possible table combinations of total/non-total, but it seems like SAS just might have a better way, maybe tucked away in an esoteric proc step I've never heard of...
--
Attempt to show example, using three independent variables and 1 dependent variable:
Ind1|2|3|Dependent1
0 0 0 1
0 0 1 3
0 1 0 5
0 1 1 7
Desired output would be:
0 0 ALL 4
0 1 ALL 12
0 ALL 0 6
0 ALL 1 10
ALL 0 0 1
ALL 0 1 3
ALL 1 0 5
ALL 1 1 7
0 ALL ALL 16
ALL 0 ALL 4
ALL 1 ALL 12
ALL ALL 0 6
ALL ALL 1 10
ALL ALL ALL 16
0 0 0 1
0 0 1 3
0 1 0 5
0 1 1 7
I may have forgotten some combinations, but that should serve to get the point across.
PROC MEANS should do this for you trivially. You need to clean up the output in order to get it to perfectly match what you want (missing for INDx = "ALL" in your example) but otherwise it gets the calculations done properly.
data have;
input Ind1 Ind2 Ind3 Dependent1;
datalines;
0 0 0 1
0 0 1 3
0 1 0 5
0 1 1 7
;;;;
run;
proc means data=have;
class ind1 ind2 ind3;
var dependent1;
output out=want sum=;
run;