say I have two rows of data I try to read in.
cody: 10 9 20 18
john: 4 5 1 2
and I want to read them in a two row style in datalines, like such:
input cody john ##;
datalines;
10 9 20 18
4 5 1 2
run;
But this reads it in like cody: 10 20 4 1 john: 9 18 5 2
How do I fix this?
You'd need to read in the CODY lines all at once, then the JOHN lines all at once. It's unclear what the final data structure should look like, but this is one possibility, and then you can restructure this how you wish, perhaps with PROC TRANSPOSE.
Basically, I assign name to the proper name (using an array here, but you can do this in better ways, data-driven ways, depending on your data). Then I loop and tell SAS to keep reading in data until it is unable to read any more, using the truncover option (or missover is also fine) to make sure it doesn't skip to the next line, and output a new row for each value.
data want;
array names[2] $ _temporary_ ("Cody","John") ;
infile datalines truncover;
do _name = 1 to 2;
name = names[_name];
do _i = 1 by 1 until (missing(value));
input value #;
if not missing(value) then output;
end;
input;
end;
drop _:;
datalines;
10 9 20 18
4 5 1 2
run;
I think that the solution to your problem is to use the names as another column, not as variables, like this:
data foo;
input var1 $ var2 var3 var4 var5;
datalines;
cody 10 9 20 18
john 4 5 1 2
;
run;
Related
I am trying to select the last non-missing DAT value to ADT which if the SUBJID have two consecutive missing DAT, else the latest DAT will be set to the value of ADT.
Below code produce the data I have, and I want to have the ADT could be derived with the illustratioin of below rule (either finally merged to this set HAVE or just create into a totally new set):
for subjid 1001: 1997-05-01 for this subject, there is no consecutive missing (though only single non-consecutive missing)
for subjid 1002: 1998-02-01, as this subject has missing consecutively at AVISIT of 2-5
for subjid 1003: 1999-03-08, as the first consecutive missing happened at AVISIT of 4, and at AVISIT=3, there is non-missing DAT.
Hope you can help me. Thanks.
data have;
infile datalines truncover;
input subjid avisit dat : yymmdd10.;
format dat yymmdd10.;
datalines;
1001 0 1997-01-01
1001 1 1997-02-01
1001 2
1001 3 1997-05-01
1002 0 1998-01-01
1002 1 1998-02-01
1002 2
1002 3
1002 4
1002 5
1002 6 1998-12-01
1003 0 1999-01-01
1003 1 1999-02-01
1003 2
1003 3 1999-03-08
1003 4
1003 5
1003 6 1999-05-01
1003 7
1003 8
;
run;
The below will create a dataset that includes the last non-missing dat whenever the count of consecutively missing dat is greater than or equal to 2. Each time we encounter a missing dat, we increase the consecutive missing counter nmiss by 1.
We are always storing the last valid value of dat in the variables last_nonmissing_dat and last_nonmissing_avisit such that they always carry forward when a missing value is encountered. When two consecutive missing values occur, we output the results.
nmiss and last_nonmissing are reset whenever we move to a new subjid.
data want;
set have;
by subjid;
retain last_nonmissing_dat
last_nonmissing_avisit
;
if(first.subjid) then call missing(nmiss, of last_nonmissing:);
if(missing(dat)) then nmiss+1;
else do;
nmiss = 0;
last_nonmissing_dat = dat;
last_nonmissing_avisit = avisit;
end;
if(nmiss GE 2) then output;
format last_nonmissing_dat yymmdd10.;
run;
This is my code:
DATA sales;
INFILE 'D:\Users\...\Desktop\Onions.dat';
INPUT VisitingTeam $ 1-20 ConcessionSales 21-24 BleacherSales 25-28
OurHits 29-31 TheirHits 32-34 OurRuns 35-37 TheirRuns 38-40;
PROC PRINT DATA = sales;
TITLE 'SAS Data Set Sales';
RUN;
This is the data, but the spacing may be incorrect.
Columbia Peaches 35 67 1 10 2 1
Plains Peanuts 210 . 2 5 0 2
Gilroy Garlics 151035 12 11 7 6
Sacramento Tomatoes 124 85 15 4 9 1
;
I need to add or delete a blank column at the 19th
column. Can someone help?
Just open the dataset and then look at what the variable name is. Then do:
Data Want (drop=varible_name_you_are_dropping); /*This is your output dataset*/
Set have; /*this is your dataset you have*/
Run;
I am currently restructuring my package from SAS Base to SAS Enterprise Guide in a knowledge transfer to a client. Unfortunately, one aspect I have to sacrifice is the change from using compress to strip in my proc sql left joins, for example the following code doesn't work
data have;
input ID VarA;
datalines;
1 2
2 3
3 4
4 5
;
run;
data have1;
input ID Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9;
datalines;
1 3 4 6 7 3 6 6 7 8
2 2 2 2 2 5 6 7 2 1
3 5 6 7 8 4 5 3 4 3
4 3 4 6 7 4 6 8 3 6
;
run;
proc sql;
create table Want as
select a.*
,b.Var1
,b.Var2
,b.Var3
,b.Var4
,b.Var5
,b.Var6
,b.Var7
,b.Var8
,b.Var9
from Have as a
left join Have1 as b
on compress(a.ID) = compress(b.ID);
quit;
I can use the strip function at times but it is safer to deliver a package with compress as there is often misplaced spaces in observations. any ideas?
Edit: to save further confusion, I usually use the compress function to look up reference rates of bonds like EURIBOR 006m - this makes my generic example incorrect but the left join typically uses character variables
You need a character variable to use the compress function. Your ID variables are numeric.
Try converting to character:
on compress(put(a.ID,8.)) = compress(put(b.ID,8.));
What's the code program in SAS to stack data?
For the purpose of example, lets say I have this dataset:
DATA test.one;
INPUT Name $ Y1996 Y1997 Y1998 Y1999;
cards;
Dan 5 10 40 20
Derek 10 12 10 10
run;
proc print data = test.one;
run;
Running this set would give me an output like this:
Name Y1996 Y1997 Y1998 Y1999
Dan 5 10 40 20
Derek 10 12 10 10
However, I would want my data to look like this:
Name Year Income
Dan 1996 5
Dan 1997 10
Dan 1998 40
Dan 1999 20
Derek 1996 10
Derek 1997 12
Derek 1998 10
Derek 1999 10
It would create a new variable income corresponding to the stacking the of the data as shown above.
Are you asking how to read the raw data directly into that form?
DATA want;
INPUT Name $ #;
do year=1996 to 1999;
input income #;
output;
end;
cards;
Dan 5 10 40 20
Derek 10 12 10 10
;
The PROC Transpose can solve this;
DATA test.one;
INPUT Name $ y1996 y1997 y1998 y1999;
cards;
Dan 5 10 40 20
Derek 10 12 10 10
run;
proc print data = test.one;
run;
proc transpose data=test.one out=long1;
by name;
run;
data test2;
set long1 (rename=(col1=Income));
RUN;
It will then transform the dataset into a stacked version.
I am wondering the best way to transpose data in SAS when I have multiple occurances of my id variable. I know I can use the let option in the proc transpose statement to do this, but I do not want to get rid of any data, as I intend to compute averages.
Here is an example of my data and my code:
data grades;
input student testnum grade;
cards;
1 1 30
1 1 25
1 2 45
1 3 67
2 1 22
2 2 63
2 2 12
2 2 77
3 1 22
3 1 17
3 2 14
3 4 17
;
run;
proc sort data=grades;
by student testnum;
run;
proc transpose data=grades out=trgrades;
by student;
id testnum;
var grade;
run;
Here is how I would like my resulting dataset to look:
student testnum1 testnum2 testnum3 testnum4 avg12 avg34
1 30 45 67 . 33.33 67
1 25 . . . 33.33 67
2 22 63 . . 43.5 .
2 . 12 . . 43.5 .
2 . 77 . . 43.5 .
3 22 14 . 17 53 17
3 17 . . . 53 17
I want to use this new dataset (not sure how yet) to create the new columns that are the average score of all testnum1's and testnum2's for a student (avg12) and the average of all testenum3's and testnum4's (avg34) for a student.
There may be a much more efficient way to do this but I am stumped.
Any advice is appreciated.
If all you really need is the average of all test 1's and 2's, and 3's and 4's for each student, then you don't need to transpose at all. All you need is a simple data step:
data grouped;
set grades;
if testnum In (1,2) then group=1;
else if testnum in (3,4) then group=2;
run;
Then a basic proc means:
proc means data=grouped;
by student group;
var grade;
output out=averages mean=groupaverage;
run;
If you need the averages in a single observation, you can easily transpose the averages dataset.
proc transpose data=grades out=trgrades;
by student;
id group;
var grade;
run;
Update:
As mentioned by #Keith, using a format to group the tests is an excellent choice as well. Skip the data step and create the format like so:
proc format;
value TestGroup
1,2 = 'Tests 1 and 2'
3,4 = 'Tests 3 and 4'
;
run;
Then the proc means becomes:
proc means data=grouped;
by student testnum;
var grade;
format testnum TestGroup.;
output out=averages mean=groupaverage;
run;
End Update
If, for some reason, you really need to have all the test scores in one observation then I would recommend using a data step to make them uniquely identifiable. Use by, testnum.first, retain, and a simple counter to assign each score a retake number. Now your transpose uses retake and testnum as id variables. You should be able to figure it out from there.
Really hoping right now that I didn't just do your SAS homework assignment for you.