I have the university edition of SAS.
I have data from treatment groups A, B, and C. I am trying to use DO loops to process the groups separately for comparison. I can do it in one nested DO loop when the data lengths are the same. But these groups have different numbers of observations and I am running into trouble. Here is my code:
data AirPoll1 (keep = Group Ozone);
label Group = "Treatment Group";
label Ozone = 'Ozone level (in ppb)';
do i=1 to 1;
input Group $##
do j=1 to 15;
input Ozone ##;
output;
end;
end;
do i=1 to 1;
input Group $ ##;
do j=1 to 10;
input Ozone ##;
output;
end;
end;
do i=1 to 1;
input Group $ ##;
do j=1 to 11;
input Ozone ##;
output;
end;
end;
datalines;
A 4 6 3 4 7 8 2 3 4 1 8 9 5 6 3
B 5 3 6 2 1 2 4 3 2 4
C 8 9 7 8 6 7 6 7 9 8 9
;
run;
proc univariate data = AirPoll1;
Var Ozone;
by Group;
histogram Ozone;
run;
The error I am getting is:
ERROR 161-185: No matching DO/SELECT statement.
Is there a quick way to fix this?
Quick fix indeed
you have missed off the semi-colon of the first input line,
doh:)
happy programming
Related
Hello so this is a sample of my data (There is an additional column of LBCAT =URINALYSIS for those panel of tests)
I've been asked to only include the panel of tests where LBNRIND is populated for any of those tests and the rest to be removed. Some subjects have multiple test results at different visit timepoints and others only have 1.I can't utilise a simple where LBNRIND ne '' in the data step because I need the entire panel of Urinalysis tests and not just that particular test result. What would be the best approach here? I think transposing the data would be too messy but maybe putting the variables in an array/macro and utilising a do loop for those panel of tests?.
Update:I've tried this code but it doesn't keep the corresponding tests for where lb_nrind >0. If I apply the sum(lb_nrind > '' ) the same when applying lb_nrind > '' to the having clause
*proc sql;
*create table want as
select * from labUA
group by ptno and day and lb_cat
having sum(lb_nrind > '') > 0 ;
data want2;
do _n_ = 1 by 1 until (last.ptno);
set labUA;
by ptno period day hour ;
if not flag_group then flag_group = (lb_nrind > '');
end;
do _n_ = 1 to _n_;
set want;
if flag_group then output;
end;
drop flag_group; run;*
You can use a SQL HAVING clause to retain rows of a group meeting some aggregate condition. In your case that group might be a patientid, panelid and condition at least one LBNRIND not NULL
Example:
Consider this example where a group of rows is to be kept only if at least one of the rows in the group meets the criteria result7=77
Both code blocks use the SAS feature that a logical evaluation is 1 for true and 0 for false.
SQL
data have;
infile datalines missover;
input id test $ parm $ result1-result10;
datalines;
1 A P 1 2 . 9 8 7 . . . .
1 B Q 1 2 3
1 C R 4 5 6
1 D S 8 9 . . . 6 77
1 E T 1 1 1
1 F U 1 1 1
1 G V 2
2 A Z 3
2 B K 1 2 3 4 5 6 78
2 C L 4
2 D M 9
3 G N 8
4 B Q 7
4 D S 6
4 C 1 1 1 . . 5 0 77
;
proc sql;
create table want as
select * from have
group by id
having sum(result7=77) > 0
;
DOW Loop
data want;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if not flag_group then flag_group = (result7=77);
end;
do _n_ = 1 to _n_;
set have;
if flag_group then output;
end;
drop flag_group;
run;
I have a dataset that looks like:
Hour Flag
1 1
2 1
3 .
4 1
5 1
6 .
7 1
8 1
9 1
10 .
11 1
12 1
13 1
14 1
I want to have an output dataset like:
Total_Hours Count
2 2
3 1
4 1
As you can see, I want to count the number of hours included in each period with consecutive "1s". A missing value ends the consecutive sequence.
How should I go about doing this? Thanks!
You'll need to do this in two steps. First step is making sure the data is sorted properly and determining the number of hours in a consecutive period:
PROC SORT DATA = <your dataset>;
BY hour;
RUN;
DATA work.consecutive_hours;
SET <your dataset> END = lastrec;
RETAIN
total_hours 0
;
IF flag = 1 THEN total_hours = total_hours + 1;
ELSE
DO;
IF total_hours > 0 THEN output;
total_hours = 0;
END;
/* Need to output last record */
IF lastrec AND total_hours > 0 THEN output;
KEEP
total_hours
;
RUN;
Now a simple SQL statement:
PROC SQL;
CREATE TABLE work.hour_summary AS
SELECT
total_hours
,COUNT(*) AS count
FROM
work.consecutive_hours
GROUP BY
total_hours
;
QUIT;
You will have to do two things:
compute the run lengths
compute the frequency of the run lengths
For the case of using the implict loop
Each run length occurnece can be computed and maintained in a retained tracking variable, testing for a missing value or end of data for output and a non missing value for run length reset or increment.
Proc FREQ
An alternative is to use an explicit loop and a hash for frequency counts.
Example:
data have; input
Hour Flag; datalines;
1 1
2 1
3 .
4 1
5 1
6 .
7 1
8 1
9 1
10 .
11 1
12 1
13 1
14 1
;
data _null_;
declare hash counts(ordered:'a');
counts.defineKey('length');
counts.defineData('length', 'count');
counts.defineDone();
do until (end);
set have end=end;
if not missing(flag) then
length + 1;
if missing(flag) or end then do;
if length > 0 then do;
if counts.find() eq 0
then count+1;
else count=1;
counts.replace();
length = 0;
end;
end;
end;
counts.output(dataset:'want');
run;
An alternative
data _null_;
if _N_ = 1 then do;
dcl hash h(ordered : "a");
h.definekey("Total_Hours");
h.definedata("Total_Hours", "Count");
h.definedone();
end;
do Total_Hours = 1 by 1 until (last.Flag);
set have end=lr;
by Flag notsorted;
end;
Count = 1;
if Flag then do;
if h.find() = 0 then Count+1;
h.replace();
end;
if lr then h.output(dataset : "want");
run;
Several weeks ago, #Richard taught me how to use DOW-loop and direct addressing array. Today, I give it to you.
data want(keep=Total_Hours Count);
array bin[99]_temporary_;
do until(eof1);
set have end=eof1;
if Flag then count + 1;
if ^Flag or eof1 then do;
bin[count] + 1;
count = .;
end;
end;
do i = 1 to dim(bin);
Total_Hours = i;
Count = bin[i];
if Count then output;
end;
run;
And Thanks Richard again, he also suggested me this article.
I don't know how to describe this question but here is an example. I have an initial dataset looks like this:
input first second $3.;
cards;
1 A
1 B
1 C
1 D
2 E
2 F
3 S
3 A
4 C
5 Y
6 II
6 UU
6 OO
6 N
7 G
7 H
...
;
I want an output dataset like this:
input first second $;
cards;
1 "A,B,C,D"
2 "E,F"
3 "S,A"
4 "C"
5 "Y"
6 "II,UU,OO,N"
7 "G,H"
...
;
Both tables will have two columns. Unique value of range of the column "first" could be 1 to any number.
Can someone help me ?
something like below
proc sort data=have;
by first second;
run;
data want(rename=(b=second));
length new_second $50.;
do until(last.first);
set have;
by first second ;
new_second =catx(',', new_second, second);
b=quote(strip(new_second));
end;
drop second new_second;
run;
output is
first second
1 "A,B,C,D"
2 "E,F"
3 "A,S"
4 "C"
5 "Y"
6 "II,N,OO,UU"
7 "G,H"
You can use by-group processing and the retain function to achieve this.
Create a sample dataset:
data have;
input id value $3.;
cards;
1 A
1 B
1 C
1 D
2 E
2 F
3 S
3 A
4 C
5 Y
6 II
6 UU
6 OO
6 N
7 G
7 H
;
run;
First ensure that your dataset is sorted by your id variable:
proc sort data=have;
by id;
run;
Then use the first. and last. notation to identify when the id variable is changing or about to change. The retain statement tells the datastep to keep the value within concatenated_value over observations rather than resetting it to a blank value. Use the quote() function to apply the " chars around the result before outputting the record. Use the cats() function to perform the actual concatenation and separate the records with a ,.
data want;
length contatenated_value $500.;
set have;
by id;
retain contatenated_value ;
if first.id then do;
contatenated_value = '';
end;
contatenated_value = catx(',', contatenated_value, value);
if last.id then do;
contatenated_value = quote(cats(contatenated_value));
output;
end;
drop value;
run;
Output:
contatenated_
value id
"A,B,C,D" 1
"E,F" 2
"S,A" 3
"C" 4
"Y" 5
"II,UU,OO,N" 6
"G,H" 7
I want to output the last observation in variable which is an integer sequence in a sas data set.
I have this data set:
data have;
input seq var;
datalines;
1 7
2 6
3 3
1 1
2 4
1 8
2 9
3 1
4 8
;
run;
I would like to achieve the following:
seq var
3 3
2 4
4 8
I have thoroughly searched for my answer online but couldn't find anything.
You can use a look-ahead technique. This is one of many ways to write it.
data last;
set have end=eof;
if not eof then set have(firstobs=2 keep=seq rename=(seq=nseq));
if nseq eq 1 or eof then output;
drop nseq;
run;
Just to give an indication of the slickness of the look-ahead approach - you can do the same thing with lag, but it takes nearly twice as many lines of code:
data want(drop=prev_:);
set have end = eof;
prev_seq = lag(seq);
prev_var = lag(var);
if seq < prev_seq then do;
seq = prev_seq;
var = prev_var;
end;
if eof or seq = prev_seq;
run;
The problem, that if the else is executing, increment of S will not accomplish. Any idea?
data osszes_folyositas;
set osszes_tabla;
retain old_xname;
retain s 0;
if xname ne old_xname then
do;
old_xname = xname;
s = 0;
end;
else
do;
s = s + Foly_s_tott_t_rgyh_ban_HUF;
delete;
end;
run;
Not sure what you are trying to do. But if you have your records ordered by "xname", and for each group of "xname" just want to sum across a value, you could try the following.
data sample;
input xname$1-6 myvalue;
datalines;
name01 5
name01 1
name02 3
name02 8
name02 4
name03 7
;
data result;
set sample;
by xname;
retain s 0;
if first.xname then s=0;
s=s+myvalue;
if last.xname then output;
run;
proc print data=result;
run;
This resets "s" for each group of "xname" and outputs the last record with "s" set to the sum of "myvalue" across the group. The result looks like this:
Obs xname myvalue s
1 name01 1 6
2 name02 4 15
3 name03 7 7
This kind of tasks can be best handled with a programming pattern known as DoW (aka Do loop of Whitlock). For each by-group, the initialization comes before the loop and the observation output after the loop. Can you see how it really works out? This paper is old but a must-read.
data sample;
input xname$1-6 myvalue;
datalines;
name01 5
name01 1
name02 3
name02 8
name02 4
name03 7
;
run;
proc sort data=sample;
by xname;
run;
data result;
if 0 then set sample; /* prep pdv */
s = 0;
do until (last.xname);
set sample;
by xname;
s + myValue;
end;
run;
/* check */
proc print data=result;
run;
/* on lst
Obs xname myvalue s
1 name01 1 6
2 name02 4 15
3 name03 7 7
*/