SAS: Avoid end-of-line problem and LOST CARD - sas

I'm working through a SAS exercise, which has data in the following format:
3496 Jerry Nelson 13960 Wilson Dr. San Diego CA 92191 40 4
3498 Scott Mason 9226 College Dr. Oak View CA 93022 95 2
3498 CA 35 3
3498 CA 35 11
3500 Michele Stone 8393 West Ct. Emeryville CA 94608 55 5
3500 CA 70 5
For each person, the data continues until the next person's name. The following code is very close to what I need, I think:
libname Ch4data '\\Client\C$\Users\m210028\Google Drive\Adrian\Self-Study\SAS\Chapter4_data';
Data Ch4data.my_donations;
Infile '\\Client\C$\Users\m210028\Google Drive\Adrian\Self-Study\SAS\Chapter4_data\Donations.dat' MISSOVER;
Array amounts(10);
Array months(10);
Input first_name $ 6 - 19
last_name $ 20 - 33
street_address $ 34 - 58
city $ 59 - 88
state_code $ 89 - 93
zip_code $ 94 - 100
amounts(1) 101 - 105 # 106
months(1);
end = end1;
If ~(end1) Then
Do;
Input test_char $ 6-6 #;
i = 2;
Do While (0 = ANYALPHA(test_char));
Input amounts(i) 101 - 105 # 106
months(i);
end = end1;
If ~(end1) Then Input test_char $ 6-6 #;
Else test_char = '';
i = i+1;
End;
End;
Run;
Proc Print Data = Ch4data.my_donations;
Title 'Donations to Coastal Humane Society';
Run;
The problem is that I'm getting a LOST CARD note in the log, and the last name in the file, Michele Stone, doesn't make it into the data set. I suspect my code for detecting the end-of-file is incorrect. Could someone please show me how to detect the end-of-file? The SAS documentation is not helpful.
Many thanks for your time!
[UPDATE]: Thanks to Tom's comment, I can now get the last line with the following code:
libname Ch4data '\\Client\C$\Users\m210028\Google Drive\Adrian\Self-Study\SAS\Chapter4_data';
Data Ch4data.my_donations;
Infile '\\Client\C$\Users\m210028\Google Drive\Adrian\Self-Study\SAS\Chapter4_data\Donations.dat' MISSOVER END=end1;
Array amounts(10);
Array months(10);
Input first_name $ 6 - 19
last_name $ 20 - 33
street_address $ 34 - 58
city $ 59 - 88
state_code $ 89 - 93
zip_code $ 94 - 100
amounts(1) 101 - 105 # 106
months(1);
If ~(end1) Then
Do;
Input test_char $ 6-6 #;
i = 2;
Do While (0 = ANYALPHA(test_char));
Input amounts(i) 101 - 105 # 106
months(i);
If ~(end1) Then Input test_char $ 6-6 #;
Else test_char = '';
i = i+1;
End;
End;
Run;
Proc Print Data = Ch4data.my_donations;
Title 'Donations to Coastal Humane Society';
Run;
Unfortunately, it's not getting the second-to-last line. For that matter, it's skipping a lot of first lines of records. Thoughts?

You are trying to combine reading and transposing. It is probably easier to read first and then transpose. In fact you can just read
data step1;
Infile example truncover ;
Input first_name $ 6 - 19
last_name $ 20 - 33
street_address $ 34 - 58
city $ 59 - 88
state_code $ 89 - 93
zip_code $ 94 - 100
amount 101 - 105
month 105 - 110
;
if not missing(first_name) then case+1;
run;
and then apply the carry-forward of the names etc.
data step2;
update step1(obs=0) step1;
by case;
output;
run;
and then transpose.
data want;
do row=1 by 1 until(last.case);
set step2;
by case;
array months [10];
array amounts [10];
months[row]=month;
amounts[row]=amount;
end;
drop row amount month;
run;

You will need to use the line holding specifier ## to hold the line when your name check detects the first line of the next group.
filename exercise 'c:\temp\exercise.txt';
* create file to read in;
data _null_;
file exercise;
input;
put _infile_;
datalines;
3496 Jerry Nelson 13960 Wilson Dr. San Diego CA 92191 40 4
3498 Scott Mason 9226 College Dr. Oak View CA 93022 95 2
3498 CA 35 3
3498 CA 35 11
3500 Michele Stone 8393 West Ct. Emeryville CA 94608 55 5
3500 CA 70 5
run;
* read-in the data;
* error will occur if data file has a group with more than 10 months of data;
data want;
infile exercise end=end_of_data ;
array amounts(10);
array months(10);
input first_name $ 6 - 19
last_name $ 20 - 33
street_address $ 34 - 58
city $ 59 - 88
state_code $ 89 - 93
zip_code $ 94 - 100
amounts(1) 101 - 105
# 106 months(1);
do i = 2 by 1 while (not end_of_data);
input name_check $ 6-6 ##;
if name_check = ' ' then
input amounts(i) 101-105 #106 months(i);
else
leave; /* jump out of loop
* when control returns to top the input will be of the held line
*/
end;
run;

Related

SAS: Unable to add variable to data set

I have a data set and am trying to add four new variables using the existing ones. I keep getting an error that says the code is incomplete. I'm having trouble seeing where it is incomplete. How do I fix this?
data dataset;
input ID $
Height
Weight
SBP
DBP
WtKg = Weight/2.2;
HtCm = Height/2.4;
AveBP = DBP + (SBP - DBP)/3;
HtPolynomial = (2*Height)**2 + (1.5*Height)**3;
datalines;
001 68 150 110 70
002 73 240 150 90
003 62 101 120 80
run;
You did not end your input statement with a semicolon. input reads variables from external data (in this case, in-line data with the datalines statement). New variables are not created within input in the way you've specified.
Use input to read in the five variables of your data. After that, create new variables based on those five read-in variables:
data dataset;
input ID $
Height
Weight
SBP
DBP
;
WtKg = Weight/2.2;
HtCm = Height/2.4;
AveBP = DBP + (SBP - DBP)/3;
HtPolynomial = (2*Height)**2 + (1.5*Height)**3;
datalines;
001 68 150 110 70
002 73 240 150 90
003 62 101 120 80
;
run;
Correcting 2 errors should fix this:
Add a semicolon after the last field being read in from the datalines, which is DBP.
(A previous version of this question used the ^ symbol for exponents.) Instead of ^ to raise to the power of something, use **
For reference, SAS arithmetic operators are described here.
After making the 2 corrections above I ran the revised code below without any errors.
data dataset;
input ID $
Height
Weight
SBP
DBP;
WtKg = Weight/2.2;
HtCm = Height/2.4;
AveBP = DBP + (SBP - DBP)/3;
HtPolynomial = (2*Height)**2 + (1.5*Height)**3;
datalines;
001 68 150 110 70
002 73 240 150 90
003 62 101 120 80
run;

How to add a flag based on a condition on previous rows in SAS

I have the following data and would like to add a flag to each row if a condition is met in the previous row.
In the following data, I want a flag=1 if Cntr=S and only if the next row is FE followed by BC/ABC. I don't want 2/8/2019 observation for 101 and no data for 102 as there is no BC/ABC after FE.
Have:
id Date Evt Cntr
101 2/2/2019 FE
101 2/3/2019 BC S
101 2/4/2019 FE
101 2/5/2019 BC
101 2/6/2019 FE
101 2/7/2019 ABC
101 2/8/2019 FE
102 2/2/2019 FE
Want:
id Date Evt Cntr flag
101 2/2/2019 FE
101 2/3/2019 BC S
101 2/4/2019 FE 1
101 2/5/2019 BC 1
101 2/6/2019 FE 1
101 2/7/2019 ABC 1
101 2/8/2019 FE
102 2/2/2019 FE
I tried using lag and retain functions to solve this problem but did not get what I wanted. Please help !!
This is another case where DOW processing can compute the flagging state of a row.
Arrays can be used to track values in the group. The arrays simplify computing the flagging of multiple regions after the S. Choose an array size greater than the largest expected group size.
data have;
infile datalines missover;
attrib
id format=4.
date informat=mmddyy10. format=mmddyy10.
evt length=$3
cntr length=$1
;
input
id Date Evt Cntr; datalines;
101 2/2/2019 FE
101 2/3/2019 BC S
101 2/4/2019 FE
101 2/5/2019 BC
101 2/6/2019 FE
101 2/7/2019 ABC
101 2/8/2019 FE
102 2/2/2019 FE
;
data want;
array evts[-1:1000] $3 _temporary_ ;
array flags[1000] $1 _temporary_;
call missing(of evts[*]);
call missing(of flags[*]);
do _n_ = 1 to dim(flags) until (last.id);
set have;
by id;
evts[_n_] = evt;
if cntr='S' then _s_index = _n_;
if 0 < _s_index < _n_ - 1 then
if evt in ('BC', 'ABC') then
if evts[_n_-1] = 'FE' then
do ;
flags[_n_] = '1';
flags[_n_-1] = '1';
end;
end;
if not last.id then do;
put 'ERROR: ' id= 'group size larger than array size';
stop;
end;
* apply flag value computed for each row of the group;
do _n_ = 1 to _n_;
set have;
flag = flags[_n_];
output;
end;
drop _:;
run;

Split string in SAS by keeping the 0 lead values

My dataset looks like this
And I want it to look like this:
Subject Code site subj
0156 00062 156 62
0156 00062 156 62
0047 00032 47 32
0034 00066 34 66
0032 00029 32 29
.
.
My Code:
if "Subject Code"n ^="" then site=input(scan("Subject Code"n,1,' '),z9.);
put site=;
if "Subject Code"n ^="" thensubj=input(strip(substr((scan("Subject Code"n,-1)),1,4)),$4.);
put subj=;
The output I get:
site=15600062
subj=1560
As you can see SAS takes out the leading 0 values and the space " ", because of which it's difficult to split.
You might be over complicating. Try:
length site subj 8; * declare the variables as numeric;
site = input (scan ('Subject Code'n,1), 8.);
subj = input (scan ('Subject Code'n,2), 8.);
The variables will need a z format if you want the values to be displayed with leading zeros when rendered in viewers or proc output.
format site z4.;
format subj z5.;
data have;
input subjectcode $&10.;
datalines;
0156 00062
0156 00062
0047 00032
0034 00066
0032 00029
;
data want;
set have;
site=prxchange('s/0*([1-9]+) 0*([1-9]+)/$1/', -1, subjectcode);
subj=prxchange('s/0*([1-9]+) 0*([1-9]+)/$2/', -1, subjectcode);
run;

Doing Principal Components in SAS Using a Holdout and to Score New Data

I am performing Principal Components Analysis in SAS Enterprise Guide and wish to compute factor/component scores on some holdout.
KeepCombinedLR is my primary source of truth. I have another dataset, with the exact same variables, that I would like to be scored without including it in the actual factor analyses.
proc factor data = KeepCombinedLR
simple
method = prin
priors = one
rotate = varimax reorder
mineigen = 1
nfactors = 25
out = FactorScores;
var var1--var40;
run;
data Fitness;
input Age Weight Oxygen RunTime RestPulse RunPulse ##;
datalines;
44 89.47 44.609 11.37 62 178 40 75.07 45.313 10.07 62 185
44 85.84 54.297 8.65 45 156 42 68.15 59.571 8.17 40 166
38 89.02 49.874 9.22 55 178 47 77.45 44.811 11.63 58 176
40 75.98 45.681 11.95 70 176 43 81.19 49.091 10.85 64 162
44 81.42 39.442 13.08 63 174 38 81.87 60.055 8.63 48 170
44 73.03 50.541 10.13 45 168 45 87.66 37.388 14.03 56 186
;
proc factor data=Fitness outstat=FactOut
method=prin rotate=varimax score;
var Age Weight RunTime RunPulse RestPulse;
title 'Factor Scoring Example';
run;
proc print data=FactOut;
title2 'Data Set from PROC FACTOR';
run;
proc score data=Fitness score=FactOut out=FScore;
var Age Weight RunTime RunPulse RestPulse;
run;
proc print data=FScore;
title2 'Data Set from PROC SCORE';
run;
PROC SCORE will score your data for you, using your 'holdout' data set.
https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_score_examples01.htm&docsetVersion=14.3&locale=en

How to create a running 3 observation average in SAS?

I have a dataset with some volumes in a column and I want to create a second column that contains the average of the previous three observations. Is this possible?
e.g.
data have;
input Vol Avg_pre_4;
datalines;
228 .
141 .
125 .
101 164.66
116 122.33
107 114
74 108
118 99
127 99.67
123 106.33
;
run;
The LAG function is an automatic built-in queue.
VOL_AVG_OF_PRIOR3 = MEAN ( lag(Vol), lag2(Vol), lag3(Vol) )
if _n_ < 4 then VOL_AVG_OF_PRIOR3 = .;