Beginner. Reading data in SAS (Reading date and 100 score issue) - sas

The problem said: The first line is a header line and should not be read (use the infile option firstobs=2) The remaining lines contain and ID number(character). gender(character), date of birth DOB, and two scores 1 and 2. Note that there are some missing values for the scores, and you want to be sure that SAS does not go to a new line to read these values. Write a SAS DATA STEP TO READ DOB with DATE9. Here are the lines of data(I put it in my code to save space).
DATA READ;
INFILE DATALINES FIRSTOBS=2;
INPUT ID 1-3
GENDER $ 5
#7 DOB mmddyy10.
# SCORE1 3
# SCORE2 3
;
DATALINES;
***Header line: ID GENDER DOB SCORE1 SCORE2
001 M 10/10/1976 1OO 99
002 F 01/01/1960 89
003 M 05/07/2001 90 98
;
DATA PROB12_8;
SET READ;
FORMAT DOB MMDDYY9.;
RUN;
PROC PRINT DATA=PROB12_8;
RUN;
My output is:
OBS ID GENDER DOB SCORE1 SCORE2
1 1 M . . 99
2 2 F . 89 .
3 3 M . 90 98
I don't understard why the program read in that way, if I specify the amount of spaces and use the pointer in my program.
Thanks for your help.

Your problems start at SCORE1 and SCORE2 you have the pointer control specified incorrectly. Also notice that 1OO is not 100. This file can be read easily with list input and missover infile statement option.
DATA READ;
INFILE DATALINES FIRSTOBS=2 missover;
informat id $3. gender $1. dob mmddyy10.;
input ID GENDER DOB SCORE1 SCORE2;
format dob mmddyy10.;
datalines;
***Header line: ID GENDER DOB SCORE1 SCORE2
001 M 10/10/1976 1OO 99
002 F 01/01/1960 89
003 M 05/07/2001 90 98
;;;;
run;

Related

Drop observations once condition is met by multiple variables

I have the following data and used one of the existing answered questions to solve my data problem but could not get what I want. Here is what I have in my data
Amt1 is populated when the Evt_type is Fee
Amt2 is populated when the Evt_type is REF1/REF2
I don't want to display any observations after the last Flag='Y'
If there is no Flag='Y' then I want all the observations for that id (e.g. id=102)
I want to display if the next row for that id is a Fee followed by REF1/REF2 after flag='Y' (e.g. id=101) However I don't want if there is no REF1/REF2 (e.g.id=103)
Have:
id Date Evt_Type Flag Amt1 Amt2
101 2/2/2019 Fee 5
101 2/3/2019 REF1 Y 5
101 2/4/2019 Fee 10
101 2/6/2019 REF2 Y 10
101 2/7/2019 Fee 4
101 2/8/2019 REF1
102 2/2/2019 Fee 25
102 2/2/2019 REF1 N 25
103 2/3/2019 Fee 10
103 2/4/2019 REF1 Y 10
103 2/5/2019 Fee 10
Want:
id Date Evt_Type Flag Amt1 Amt2
101 2/2/2019 Fee 5
101 2/3/2019 REF1 Y 5
101 2/4/2019 Fee 10
101 2/6/2019 REF2 Y 10
101 2/7/2019 Fee 4
101 2/8/2019 REF1
102 2/2/2019 Fee 25
102 2/2/2019 REF1 N 25
103 2/4/2019 REF1 Y 10
103 2/5/2019 Fee 10
I tried the following
data want;
set have;
by id Date;
drop count;
if (first.id or first.date) and FLAG='Y' then
do;
retain count;
count=1;
output;
return;
end;
if count=1 and ((first.id or first.date) and Flag ne 'Y') then
do;
retain count;
delete;
return;
end;
output;
run;
Any help is appreciated.
Thanks
A technique known as DOW loop can perform a computation that measures a group in some way and then, in a second loop, apply that computation to members of the group.
The DOW relies on a SET statement inside the loop. In this case the computation is 'what row in the group is the last one having flag="Y".
data want;
* DOW loop, contains computation;
_max_n_with_Y = 1e12;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if flag='Y' then _max_n_with_Y = _n_;
end;
* Follow up loop, applies computation;
do _n_ = 1 to _n_;
set have;
if _n_ <= _max_n_with_Y then OUTPUT;
end;
drop _:;
run;
Here is one way
data have;
input id $ Date : mmddyy10. Evt_Type $ Flag $ Amt1 Amt2;
format Date mmddyy10.;
infile datalines dsd missover;
datalines;
101,2/2/2019,Fee,,5,
101,2/3/2019,REF1,Y,,5
101,2/4/2019,Fee,,10,
101,2/6/2019,REF2,Y,,10
101,2/7/2019,Fee,,4,
102,2/2/2019,Fee,,25,
102,2/2/2019,REF1,N,25,
;
data want;
do _N_ = 1 by 1 until (last.id);
set have;
by id;
if flag = "Y" then _iorc_ = _N_;
end;
do _N_ = 1 to _N_;
set have;
if _N_ le _iorc_ then output;
end;
_iorc_=1e7;
run;

adding space to character variables

when I run the following code, I see that the number in my character variable gets shifted as a value
data test;
input names$ score1 score2;
cards;
A1 80 95
A 2 80 95
;
run;
proc print data=test;
run;
leading to a output like the following
The SAS System
Obs names score1 score2
1 A1 80 95
2 A 2 80
How do I create a variable like "A 2" with space so that the 2 doesn't get shifted
Your problem is you're using space delimited data input. Is it truly space delimited, though, or is it columnar (fixed position)?
data test;
input names $ 1-4 score1 5-12 score2 13-20;
cards;
A1 80 95
A 2 80 95
;
run;
If it's truly delimited and you're just not exactly replicating the data here, you have a few choices. You can use the & character to ask SAS to look for two consecutive spaces to be a delimiter, but your actual data doesn't have that correctly either - but it would look like so:
data test;
input names &$ score1 score2;
cards;
A1 80 95
A 2 80 95
;
run;
Or if you truly have the issue here that you have some single spaces that are delimiters and some single spaces that are not, you'll have to work out some sort of logic to do this. The exact logic depends on your rules, but here's an example - here I look for that space, and assume that if it is there then there is exactly one more character, then I want to move everything down one so that I have a guaranteed double space now. This is probably not a good rule for you, but it is an example of what you might do.
data test;
input #;
if substr(_infile_,2,1)=' ' then do; *if there is a space at spot two specifically;
_infile_ = substr(_infile_,1,3)||' '||substr(_infile_,4); *shift everything after 3 down;
end;
input names &$ score1 score2;
cards;
A1 80 95
A 2 80 95
;
run;
If your input is fixed block, as suggested, and the NAMES field is 12 bytes, as suggested by the data, then you can use formatted input for NAMES.
data test;
length names $ 12 score1 score2 8;
input names $12. score1 score2;
names=trim(left(names));
cards;
A1 80 95
A 2 80 95
;
run;

first and last statements in SAS

I am trying to do a count on the number of births. the data looks this way
ID date
101 2016-01-01
101 2016-02-01
101 2016-02-01
102 2015-03-02
102 2016-04-01
103 2016-02-08
So now i want to create a count based on the date
the output expected is this way
ID date count
101 2016-01-01 1
101 2016-02-01 2
101 2016-02-01 2
102 2015-03-02 1
102 2016-04-01 2
103 2016-02-08 1
I am trying to do it by first and last and also the count from proc sql but I am missing something here.
data temp;
set temp;
by ID DATE notsorted;
if first.date then c=1;
else c+1;
if first.ID then m=1;
else m+1;
run;
Another solution with your original approach
data x;
input id : 3. date : ddmmyy10.;
FORMAT DATE ddmmyy10.;
datalines;
101 01-01-2016
101 02-01-2016
101 02-01-2016
102 03-02-2015
102 04-01-2016
103 02-08-2016
;
run;
data x;
set x;
by ID DATE notsorted;
if first.ID then c=0; /*reset count every time id changes*/
if first.date then c+1; /*raise count when date changes*/
run;
produces
Do you absolutely require to use first?
I would use proc freq to achieve this
data have;
infile datalines delimiter='09'x;
input ID $ date $10. ;
datalines;
101 2016-01-01
101 2016-02-01
101 2016-02-01
102 2015-03-02
102 2016-04-01
103 2016-02-08
;run;
proc freq DATA=have NOPRINT;
TABLES ID * date / OUT=want(drop=percent);
run;
creates this:
ID date count
101 2016-01-01 1
101 2016-02-01 2
102 2015-03-02 1
102 2016-04-01 1
103 2016-02-08 1
If you want to reproduce COUNT in the datastep you will have to use the double DOW. The dataset is SET twice. First time to count rows by ID and date. Second time to output all rows.
data out;
do _n_ = 1 by 1 until (last.date);
set test ;
by ID date;
if first.date then count = 1;
else count + 1;
end;
do _n_ = 1 by 1 until (last.date);
set test ;
by ID date;
output;
end;
run;
You forget to add RETAIN statement in your data-step.
data temp;
set temp;
retain c m 0;
by ID DATE notsorted;
if first.date then c=1;
else c+1;
if first.ID then m=1;
else m+1;
run;
Okay, I have edited the previous code. Hopefully this will suit your needs. Just make sure your date variable is in numeric or calendar format so that you can sort your table by ID and date first.
data want;
set have;
by id date;
if first.date then count=0;
count+1;
run;

how to read 12 digit numeric in sas

i am new to sas trying to read csv file
sample of csv i am trying to read
Olive Mathews , 119-574-8639 , 47 Summit Ave , 22186,Portugal
Jami Gonzales , 182-680-4169 , 81521 Chico Hwy , 69148 , Cambodia
Mabel Holland , 561-729-2640 , 87 State Hwy 160 , 32798 , Viet Nam
Alice Barron , 453-687-5745 , 621 State Hwy 171 N , 41322 , Belize
sas code i wrote to read csv
data jul10.second;
infile '/folders/myshortcuts/myfolder/csv/data.csv' dlm=',' firstobs=2 ;
length name$20 phoneno 7 address$40 zipcode 6 country$40 ;
input name$ phoneno address $ zipcode country$;
run;
gives error at phoneno variable (Invalid data for phoneno in line 2 15-26.)
but if i convert phoneno variable into character variable there is no error -
data jul10.second;
infile '/folders/myshortcuts/myfolder/csv/data.csv' dlm=',' firstobs=2 ;
length name$20 phoneno $12 address$40 zipcode 6 country$40 ;
input name$ phoneno $ address $ zipcode country$;
run;
why is that ? why i cant put 12digit number in numeric variable phoneno ?
The 12-digit 'number' isn't a number due to the hyphens. If you wish to convert it to a number without the hyphens, use the compress() function to remove them, and input() to convert to a number...
realphone = input(compress(phoneno,'-'),10.) ;

Why informat is not working in SAS

Tried various formats of date, but output do not reflects any date. What could be the issue?
data c;
input age gender income color$ doj$;
format doj date9.;
datalines;
19 1 14000 W 14/07/1988
45 2 45000 b 15/09/1956
34 2 56000 y 14/09/1967
33 1 45000 b 14/02/1956
;
run;
You are mixing things up a bit.
The date formats are to be applied on numeric data, not on text data.
So you should not read in doj as $ (text), but as a date (so a date informat).
Try DDMMYY10. for doj on your input statement:
data c;
input age gender income color$ doj ddmmyy10.;
format doj date9.;
datalines;
19 1 14000 W 14/07/1988
45 2 45000 b 15/09/1956
34 2 56000 y 14/09/1967
33 1 45000 b 14/02/1956
;
run;