DATASET IS NOT AN OBJECT,how do i go about this error - if-statement

what's wrong with my code?
69 data train2.sacked;
70 train2.payrise;
71 set train2.exam (drop = test1 test2 test3 test4);
72 mean2 = mean(test1, test2, test3, test4);
73 if mean2 > 5 then
74 do
75 result = 'PASS'
76 action = 'Pay rise'
77 output payrise;
78
79 if mean2 <= 5 then
80 do
81 result = 'LOSER'
82 action = 'SACKED'
83 output sacked;
84
85 else do
86 result = 'What have I done?'
87 action = 'PARTY'
88 output aahhhhh;
89 length lname fname $ 40 result $ 20;
90 run;
I try running the code but it gives me the error.
ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.
ERROR 557-185: Variable train2 is not an object.

The error is from the second statement.
train2.payrise;
SAS thinks you are trying to reference a method of an object named TRAIN2 but the data step has not defined such an object (an example of an object in a data step is a HASH). I suspect that you meant that to be a dataset to include on the DATA statement but there is an extra semicolon in the middle of the DATA statement.
The rest of your program also has a lot of errors.
You are missing a lot of semicolons at the end of statements.
Your OUTPUT statements are trying to write to datasets not listed on the DATA statement.
You are trying to take the MEAN() of variables that you specifically told SAS to NOT load.
You are trying to set the length of character variables at the end of the data step. So either the variables existed and their length was already set so this attempt to change the length will fail. Or the variables will be created with all missing values since there is not code to assign them any values.

Here's the start of mistakes in your code. This will get you moving forward though I suspect you have more issues in your code.
semicolon to early - this semicolon is unnecssary and limits your output to one data set
Drop variables test1 to test4 which you attempt to use in the next step
Attempt to use dropped variables
Missing semicolon
Missing END
69 data train2.sacked; /*1*/
70 train2.payrise;
71 set train2.exam (drop = test1 test2 test3 test4); /*2*/
72 mean2 = mean(test1, test2, test3, test4); /*3*/
73 if mean2 > 5 then
74 do /*4*/
75 result = 'PASS' /*4*/
76 action = 'Pay rise' /*4*/
77 output payrise;
78 /*5*/
79 if mean2 <= 5 then
80 do /*4*/
81 result = 'LOSER' /*4*/
82 action = 'SACKED' /*4*/
83 output sacked;
84 /*5*/
85 else do /*4*/
86 result = 'What have I done?' /*4*/
87 action = 'PARTY' /*4*/
88 output aahhhhh;
/*5*/
89 length lname fname $ 40 result $ 20;
90 run;

Related

SAS: Unable to add variable to data set

I have a data set and am trying to add four new variables using the existing ones. I keep getting an error that says the code is incomplete. I'm having trouble seeing where it is incomplete. How do I fix this?
data dataset;
input ID $
Height
Weight
SBP
DBP
WtKg = Weight/2.2;
HtCm = Height/2.4;
AveBP = DBP + (SBP - DBP)/3;
HtPolynomial = (2*Height)**2 + (1.5*Height)**3;
datalines;
001 68 150 110 70
002 73 240 150 90
003 62 101 120 80
run;
You did not end your input statement with a semicolon. input reads variables from external data (in this case, in-line data with the datalines statement). New variables are not created within input in the way you've specified.
Use input to read in the five variables of your data. After that, create new variables based on those five read-in variables:
data dataset;
input ID $
Height
Weight
SBP
DBP
;
WtKg = Weight/2.2;
HtCm = Height/2.4;
AveBP = DBP + (SBP - DBP)/3;
HtPolynomial = (2*Height)**2 + (1.5*Height)**3;
datalines;
001 68 150 110 70
002 73 240 150 90
003 62 101 120 80
;
run;
Correcting 2 errors should fix this:
Add a semicolon after the last field being read in from the datalines, which is DBP.
(A previous version of this question used the ^ symbol for exponents.) Instead of ^ to raise to the power of something, use **
For reference, SAS arithmetic operators are described here.
After making the 2 corrections above I ran the revised code below without any errors.
data dataset;
input ID $
Height
Weight
SBP
DBP;
WtKg = Weight/2.2;
HtCm = Height/2.4;
AveBP = DBP + (SBP - DBP)/3;
HtPolynomial = (2*Height)**2 + (1.5*Height)**3;
datalines;
001 68 150 110 70
002 73 240 150 90
003 62 101 120 80
run;

SAS: Avoid end-of-line problem and LOST CARD

I'm working through a SAS exercise, which has data in the following format:
3496 Jerry Nelson 13960 Wilson Dr. San Diego CA 92191 40 4
3498 Scott Mason 9226 College Dr. Oak View CA 93022 95 2
3498 CA 35 3
3498 CA 35 11
3500 Michele Stone 8393 West Ct. Emeryville CA 94608 55 5
3500 CA 70 5
For each person, the data continues until the next person's name. The following code is very close to what I need, I think:
libname Ch4data '\\Client\C$\Users\m210028\Google Drive\Adrian\Self-Study\SAS\Chapter4_data';
Data Ch4data.my_donations;
Infile '\\Client\C$\Users\m210028\Google Drive\Adrian\Self-Study\SAS\Chapter4_data\Donations.dat' MISSOVER;
Array amounts(10);
Array months(10);
Input first_name $ 6 - 19
last_name $ 20 - 33
street_address $ 34 - 58
city $ 59 - 88
state_code $ 89 - 93
zip_code $ 94 - 100
amounts(1) 101 - 105 # 106
months(1);
end = end1;
If ~(end1) Then
Do;
Input test_char $ 6-6 #;
i = 2;
Do While (0 = ANYALPHA(test_char));
Input amounts(i) 101 - 105 # 106
months(i);
end = end1;
If ~(end1) Then Input test_char $ 6-6 #;
Else test_char = '';
i = i+1;
End;
End;
Run;
Proc Print Data = Ch4data.my_donations;
Title 'Donations to Coastal Humane Society';
Run;
The problem is that I'm getting a LOST CARD note in the log, and the last name in the file, Michele Stone, doesn't make it into the data set. I suspect my code for detecting the end-of-file is incorrect. Could someone please show me how to detect the end-of-file? The SAS documentation is not helpful.
Many thanks for your time!
[UPDATE]: Thanks to Tom's comment, I can now get the last line with the following code:
libname Ch4data '\\Client\C$\Users\m210028\Google Drive\Adrian\Self-Study\SAS\Chapter4_data';
Data Ch4data.my_donations;
Infile '\\Client\C$\Users\m210028\Google Drive\Adrian\Self-Study\SAS\Chapter4_data\Donations.dat' MISSOVER END=end1;
Array amounts(10);
Array months(10);
Input first_name $ 6 - 19
last_name $ 20 - 33
street_address $ 34 - 58
city $ 59 - 88
state_code $ 89 - 93
zip_code $ 94 - 100
amounts(1) 101 - 105 # 106
months(1);
If ~(end1) Then
Do;
Input test_char $ 6-6 #;
i = 2;
Do While (0 = ANYALPHA(test_char));
Input amounts(i) 101 - 105 # 106
months(i);
If ~(end1) Then Input test_char $ 6-6 #;
Else test_char = '';
i = i+1;
End;
End;
Run;
Proc Print Data = Ch4data.my_donations;
Title 'Donations to Coastal Humane Society';
Run;
Unfortunately, it's not getting the second-to-last line. For that matter, it's skipping a lot of first lines of records. Thoughts?
You are trying to combine reading and transposing. It is probably easier to read first and then transpose. In fact you can just read
data step1;
Infile example truncover ;
Input first_name $ 6 - 19
last_name $ 20 - 33
street_address $ 34 - 58
city $ 59 - 88
state_code $ 89 - 93
zip_code $ 94 - 100
amount 101 - 105
month 105 - 110
;
if not missing(first_name) then case+1;
run;
and then apply the carry-forward of the names etc.
data step2;
update step1(obs=0) step1;
by case;
output;
run;
and then transpose.
data want;
do row=1 by 1 until(last.case);
set step2;
by case;
array months [10];
array amounts [10];
months[row]=month;
amounts[row]=amount;
end;
drop row amount month;
run;
You will need to use the line holding specifier ## to hold the line when your name check detects the first line of the next group.
filename exercise 'c:\temp\exercise.txt';
* create file to read in;
data _null_;
file exercise;
input;
put _infile_;
datalines;
3496 Jerry Nelson 13960 Wilson Dr. San Diego CA 92191 40 4
3498 Scott Mason 9226 College Dr. Oak View CA 93022 95 2
3498 CA 35 3
3498 CA 35 11
3500 Michele Stone 8393 West Ct. Emeryville CA 94608 55 5
3500 CA 70 5
run;
* read-in the data;
* error will occur if data file has a group with more than 10 months of data;
data want;
infile exercise end=end_of_data ;
array amounts(10);
array months(10);
input first_name $ 6 - 19
last_name $ 20 - 33
street_address $ 34 - 58
city $ 59 - 88
state_code $ 89 - 93
zip_code $ 94 - 100
amounts(1) 101 - 105
# 106 months(1);
do i = 2 by 1 while (not end_of_data);
input name_check $ 6-6 ##;
if name_check = ' ' then
input amounts(i) 101-105 #106 months(i);
else
leave; /* jump out of loop
* when control returns to top the input will be of the held line
*/
end;
run;

Doing Principal Components in SAS Using a Holdout and to Score New Data

I am performing Principal Components Analysis in SAS Enterprise Guide and wish to compute factor/component scores on some holdout.
KeepCombinedLR is my primary source of truth. I have another dataset, with the exact same variables, that I would like to be scored without including it in the actual factor analyses.
proc factor data = KeepCombinedLR
simple
method = prin
priors = one
rotate = varimax reorder
mineigen = 1
nfactors = 25
out = FactorScores;
var var1--var40;
run;
data Fitness;
input Age Weight Oxygen RunTime RestPulse RunPulse ##;
datalines;
44 89.47 44.609 11.37 62 178 40 75.07 45.313 10.07 62 185
44 85.84 54.297 8.65 45 156 42 68.15 59.571 8.17 40 166
38 89.02 49.874 9.22 55 178 47 77.45 44.811 11.63 58 176
40 75.98 45.681 11.95 70 176 43 81.19 49.091 10.85 64 162
44 81.42 39.442 13.08 63 174 38 81.87 60.055 8.63 48 170
44 73.03 50.541 10.13 45 168 45 87.66 37.388 14.03 56 186
;
proc factor data=Fitness outstat=FactOut
method=prin rotate=varimax score;
var Age Weight RunTime RunPulse RestPulse;
title 'Factor Scoring Example';
run;
proc print data=FactOut;
title2 'Data Set from PROC FACTOR';
run;
proc score data=Fitness score=FactOut out=FScore;
var Age Weight RunTime RunPulse RestPulse;
run;
proc print data=FScore;
title2 'Data Set from PROC SCORE';
run;
PROC SCORE will score your data for you, using your 'holdout' data set.
https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_score_examples01.htm&docsetVersion=14.3&locale=en

Conditional Imputation SAS

My data is a list of schools and their performances on certain subject assessments, along with the percentage of gender enrolled in the course. I've created a sample data set below:
data have;
input school $ subject $ perc_male perc_female score similar_school $;
datalines;
X math 51 49 93 Y
X english 48 52 95 Y
X tech 60 40 90 Y
X science 57 43 92 Y
Y math . . 87 X
Y english . . 83 X
Y science . . 81 X
Y language . . 91 X
Z math 40 60 78 Z
Z english 50 50 76 Z
Z science 45 55 80 Z
;
run;
As you can see, no gender percentages were collected for School Y. Research shows that school X has a very similar gender distribution, so I wish to impute the subject-specific percentages from X into Y. Another problem is that Y has a score for languages, while X did not take this assessment. In this case, I wish to take the mean of the imputed values (51, 48, 57) to get 52 for percentages of language course-takers that are male.
Executing this will demonstrate my desired output:
data want;
input school $ subject $ perc_male perc_female score;
datalines;
X math 51 49 93 Y
X english 48 52 95 Y
X tech 60 40 90 Y
X science 57 43 92 Y
Y math 51 49 87 X
Y english 48 52 83 X
Y science 57 43 81 X
Y language 52 48 91 X
Z math 40 60 78 Z
Z english 50 50 76 Z
Z science 45 55 80 Z
;
run;
Got a downvote, so adding what I've tried to almost get me where I need to be. To whoever downvoted, I'd like to know if you have any constructive feedback. Thanks! I'm wondering if there is a way to build in the mean imputation part into my current snippet. Plus, I was thinking there may be a more efficient way to do this. Any help would be greatly appreciated.
proc sql;
select distinct cats("'",similar_school,"'") into :school_list separated by ','
from have
where perc_male=.;
quit;
proc sql;
create table stuff as
select similar_school as school, subject, perc_male, perc_female
from have
where school in (&school_list.);
quit;
proc sql;
create table want2 as
select a.school, a.subject, coalesce(a.perc_male,b.perc_male), coalesce(a.perc_female,b.perc_female), a.score, a.similar_school
from have as a
left join stuff as b
on a.school=b.school and a.subject=b.subject
;
quit;
Based on you expected data, palin simple SQL can solve your problem. You can first do a self join based on school and similar school information and coalesce the perc_male & perc_female information. This will take care of your first issue.. For the 2nd part of the issue you can calculate the mean per school and coalesce perc_male & perc_female information with respective mean of school. Check out the below sql and let me know if it helps.
proc sql;
create table want as
select aa.school
, aa.subject
, coalesce(aa.perc_male, mean(aa.perc_male)) as perc_male
, coalesce(aa.perc_female,mean(aa.perc_female)) as perc_female
, score
, similar_school
from (
select a.school
, a.subject
, coalesce(a.perc_male ,b.perc_male) as perc_male
, coalesce(a.perc_female,b.perc_female) as perc_female
, a.score
, a.similar_school
from have as a
left join have as b
on b.school=a.similar_school
and a.subject=b.subject
) as aa
group by aa.school
;
quit;

Create date variable from time (Using SAS 9.3)

Using SAS 9.3
I have files with two variables (Time and pulse), one file for each person.
I have the information which date they started measuring for each person.
Now I want create a date variable whom change date at midnight (of course), how?
Example from text files:
23:58:02 106
23:58:07 105
23:58:12 103
23:58:17 98
23:58:22 100
23:58:27 97
23:58:32 99
23:58:37 100
23:58:42 99
23:58:47 104
23:58:52 95
23:58:57 96
23:59:02 98
23:59:07 96
23:59:12 104
23:59:17 109
23:59:22 105
23:59:27 111
23:59:32 111
23:59:37 104
23:59:42 110
23:59:47 100
23:59:52 106
23:59:57 114
00:00:02 123
00:00:07 130
00:00:12 130
00:00:17 125
00:00:22 119
00:00:27 116
00:00:32 122
00:00:37 116
00:00:42 119
00:00:47 117
00:00:52 114
00:00:57 114
00:01:02 110
00:01:07 103
00:01:12 98
00:01:17 98
00:01:22 102
00:01:27 97
00:01:32 99
00:01:37 93
00:01:42 97
00:01:47 103
00:01:52 96
00:01:57 93
00:02:02 93
00:02:07 95
00:02:12 106
00:02:17 99
00:02:22 102
00:02:27 96
00:02:32 93
00:02:37 97
00:02:42 102
00:02:47 101
00:02:52 95
00:02:57 92
00:03:02 100
00:03:07 95
00:03:12 102
00:03:17 102
00:03:22 109
00:03:27 109
00:03:32 107
00:03:37 111
00:03:42 112
00:03:47 113
00:03:52 115
Regex:
\d{2}:\d{2}:\d{2} \d*
See here for an example and play around with regex:
https://regex101.com/r/xF1fQ5/1
EDIT: and have a look at the SAS regex tip sheet: http://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf
Something like this:
Date lastDate = startDate;
List<NData> ListData = new ArrayList<NData>();
for(FileData fdat:ListFileData){
Date nDate = this.getDate(lastDate,fdat.gettime());
NData ndata= new NData(ndate,fdat.getMeasuring());
LisData.add(nData);
lastDate = nDate;
}
.
.
.
.
function Date getDate(Date ld,String time){
Calendar cal = Calendar.getInstance();
cal.setTime(ld);
int year = cal.get(Calendar.YEAR);
int month = cal.get(Calendar.MONTH)+1;
int day = cal.get(Calendar.DAY_OF_MONTH);
int hourOfDay = this.getHour(time);
int minuteOfHour = this.getMinute(time);
org.joda.time.LocalDateTime lastDate = new org.joda.time.LocalDateTime(ld)
org.joda.time.LocalDateTime newDate = new org.joda.time.LocalDateTime(year,month,day,hourOfDay,minuteOfHour);
if(newDate.isBefore(lastDate)){
newDate = newDate.plusDays(1);
}
return newDate.toDate();
}
It's hard to provide a complete answer without sample code, but the SAS lag() function might be enough to do what you need. Your data step would include lines like the following, assuming your time variable is called time and your date variable is called date:
retain date;
if time < lag(time) then date = date + 1;
This assumes you never have any 24 hour gaps (but it appears you'd have to assume that anyway).
This answer also assumes that the time field is already in a SAS time format.