Why informat is not working in SAS - sas

Tried various formats of date, but output do not reflects any date. What could be the issue?
data c;
input age gender income color$ doj$;
format doj date9.;
datalines;
19 1 14000 W 14/07/1988
45 2 45000 b 15/09/1956
34 2 56000 y 14/09/1967
33 1 45000 b 14/02/1956
;
run;

You are mixing things up a bit.
The date formats are to be applied on numeric data, not on text data.
So you should not read in doj as $ (text), but as a date (so a date informat).
Try DDMMYY10. for doj on your input statement:
data c;
input age gender income color$ doj ddmmyy10.;
format doj date9.;
datalines;
19 1 14000 W 14/07/1988
45 2 45000 b 15/09/1956
34 2 56000 y 14/09/1967
33 1 45000 b 14/02/1956
;
run;

Related

Rearranging data in SAS

I'm pretty new in SAS, so I'm struggling to find out how to rearrange my data. My data set looks like this:
CPT DATE A B C D etc.
1 date1 20.000 5.000 0 0
1 date2 0 0 0 30.000
1 date3 0 10.000 10.000 0
2 date1 3.000 3.000 0 0
2 date2 0 0 5.000 3.000
etc.
where cpt(i) represents each counterparty, date(i) represents the date of my cash flows and A,B,C,D are the different types of cash flows. Since this dataset has lots of columns, I'd like to rearrange the data by increasing the number of rows when there is more than one cash flow in date(i). So the output is supposed to be this one:
CPT DATE Cash Flow Type
1 date1 20.000 A
1 date1 5.000 B
1 date2 30.000 D
1 date3 10.000 B
1 date3 10.000 C
2 date1 3.000 A
2 date2 3.000 B
2 date3 5.000 C
2 date4 3.000 D
etc.
Any tips on how to get what I want? Cheers
Datalines format of data is below.
data have;
input CPT DATE$ A B C D;
format a b c d 8.3;
datalines;
1 date1 20.000 5.000 0 0
1 date2 0 0 0 30.000
1 date3 0 10.000 10.000 0
2 date1 3.000 3.000 0 0
2 date2 0 0 5.000 3.000
;
run;
This is a 'wide to long' transpose. It's really easy!
data have;
input CPT DATE $ A B C D ;
datalines;
1 date1 20.000 5.000 0 0
1 date2 0 0 0 30.000
1 date3 0 10.000 10.000 0
2 date1 3.000 3.000 0 0
2 date2 0 0 5.000 3.000
;;;;
run;
proc transpose data=have out=want;
by cpt date;
var a b c d;
run;
If there are more complexities than this, you can also do this in the data step.
Use proc transpose. It's the easiest way to transpose any data in SAS. It'll automatically rename variable column names to COL1, COL2, etc. Use the rename= output dataset option to rename your variable to cash_flow.
proc transpose data = have
out = want(rename=(COL1 = cash_flow) )
name = type
;
by cpt date;
run;
A more tricked out TRANSPOSE can set the pivot column label and restrict the output to non-zero cashflow.
proc transpose data=have
out=want(
rename=(_name_=Type col1=cashflow)
where=(cashflow ne 0)
)
;
by cpt date;
var a b c d;
label cashflow='Cash Flow';
run;
You will have to endure a log message
WARNING: Variable CASHFLOW not found in data set WORK.HAVE.

matching two datasets with one month lag

I am trying to match max daily data within a month to a monthly data.
data daily;
input permno $ date ret;
datalines;
1000 19860101 88
1000 19860102 90
1000 19860201 70
1000 19860202 55
1001 19860201 97
1001 19860202 74
1001 19860203 79
1002 19860301 55
1002 19860302 100
1002 19860301 10
;
run;
data monthly;
input permno $ date ret;
datalines;
1000 19860131 1
1000 19860228 2
1000 19860331 5
1001 19860331 3
1002 19860430 4
;
run;
The result I want is the following; (I want to match daily max data to one month lag monthly data. )
1000 19860102 90 1000 19860228 2
1000 19860201 70 1000 19860331 5
1001 19860201 97 1001 19860331 3
1002 19860302 100 1002 19860430 4
Below is what I have tried so far.
I want to have maximum ret value within a month so I have created yrmon to assign same yyyymm data for the same month daily data
data a1; set daily;
yrmon=year(date)*100 + month(date);
run;
In order to choose the maximum value(here, ret) within same yrmon group for the same permno, I used code below
proc means data=a1 noprint;
class permno yrmon ;
var ret;
output out= a2 max=maxret;
run;
However, it only got me permno yrmon ret data, leaving the original date data away.
data a3;
set a2;
new=intnx('month',yrmon,1);
format date new yymmn6.;
run;
But it won't work since yrmon is no longer date format.
Thank you in advance.
Hello
I am trying to match two different sets by permno(same company) but with one month lag (eg. daily9 dataset yrmon=198601 and monthly2 dataset yrmon=198602)
it is pretty difficult to handle for me because if I just add +1 in yrmon, 198612 +1 will not be 198701 and I am confused with handling these issues.
Can anyone help?
1) informat date1/date2 yymmn6. is used to read the date in yyyymm format
2) format date1/date2 yymmn6. is used to view the date in yyyymm format
3) intnx("months",b.date2,-1) is used to join the dates with lag of 1 month
data data1;
input date1 value1;
informat date1 yymmn6.;
format date1 yymmn6.;
cards;
200101 200
200212 300
200211 400
;
run;
data data2;
input date2 value2;
informat date2 yymmn6.;
format date2 yymmn6.;
cards;
200101 3000000
200102 4000000
200301 2000000
200212 2000000
;
run;
proc sql;
create table result as
select a.*,b.date2,b.value2 from
data1 a
left join
data2 b
on a.date1 = intnx("months",b.date2,-1);
quit;
My Output:
date1 |value1 |date2 |value2
200101 |200 |200102 |4000000
200211 |400 |200212 |2000000
200212 |300 |200301 |2000000
Let me know in case of any queries.

first and last statements in SAS

I am trying to do a count on the number of births. the data looks this way
ID date
101 2016-01-01
101 2016-02-01
101 2016-02-01
102 2015-03-02
102 2016-04-01
103 2016-02-08
So now i want to create a count based on the date
the output expected is this way
ID date count
101 2016-01-01 1
101 2016-02-01 2
101 2016-02-01 2
102 2015-03-02 1
102 2016-04-01 2
103 2016-02-08 1
I am trying to do it by first and last and also the count from proc sql but I am missing something here.
data temp;
set temp;
by ID DATE notsorted;
if first.date then c=1;
else c+1;
if first.ID then m=1;
else m+1;
run;
Another solution with your original approach
data x;
input id : 3. date : ddmmyy10.;
FORMAT DATE ddmmyy10.;
datalines;
101 01-01-2016
101 02-01-2016
101 02-01-2016
102 03-02-2015
102 04-01-2016
103 02-08-2016
;
run;
data x;
set x;
by ID DATE notsorted;
if first.ID then c=0; /*reset count every time id changes*/
if first.date then c+1; /*raise count when date changes*/
run;
produces
Do you absolutely require to use first?
I would use proc freq to achieve this
data have;
infile datalines delimiter='09'x;
input ID $ date $10. ;
datalines;
101 2016-01-01
101 2016-02-01
101 2016-02-01
102 2015-03-02
102 2016-04-01
103 2016-02-08
;run;
proc freq DATA=have NOPRINT;
TABLES ID * date / OUT=want(drop=percent);
run;
creates this:
ID date count
101 2016-01-01 1
101 2016-02-01 2
102 2015-03-02 1
102 2016-04-01 1
103 2016-02-08 1
If you want to reproduce COUNT in the datastep you will have to use the double DOW. The dataset is SET twice. First time to count rows by ID and date. Second time to output all rows.
data out;
do _n_ = 1 by 1 until (last.date);
set test ;
by ID date;
if first.date then count = 1;
else count + 1;
end;
do _n_ = 1 by 1 until (last.date);
set test ;
by ID date;
output;
end;
run;
You forget to add RETAIN statement in your data-step.
data temp;
set temp;
retain c m 0;
by ID DATE notsorted;
if first.date then c=1;
else c+1;
if first.ID then m=1;
else m+1;
run;
Okay, I have edited the previous code. Hopefully this will suit your needs. Just make sure your date variable is in numeric or calendar format so that you can sort your table by ID and date first.
data want;
set have;
by id date;
if first.date then count=0;
count+1;
run;

Beginner. Reading data in SAS (Reading date and 100 score issue)

The problem said: The first line is a header line and should not be read (use the infile option firstobs=2) The remaining lines contain and ID number(character). gender(character), date of birth DOB, and two scores 1 and 2. Note that there are some missing values for the scores, and you want to be sure that SAS does not go to a new line to read these values. Write a SAS DATA STEP TO READ DOB with DATE9. Here are the lines of data(I put it in my code to save space).
DATA READ;
INFILE DATALINES FIRSTOBS=2;
INPUT ID 1-3
GENDER $ 5
#7 DOB mmddyy10.
# SCORE1 3
# SCORE2 3
;
DATALINES;
***Header line: ID GENDER DOB SCORE1 SCORE2
001 M 10/10/1976 1OO 99
002 F 01/01/1960 89
003 M 05/07/2001 90 98
;
DATA PROB12_8;
SET READ;
FORMAT DOB MMDDYY9.;
RUN;
PROC PRINT DATA=PROB12_8;
RUN;
My output is:
OBS ID GENDER DOB SCORE1 SCORE2
1 1 M . . 99
2 2 F . 89 .
3 3 M . 90 98
I don't understard why the program read in that way, if I specify the amount of spaces and use the pointer in my program.
Thanks for your help.
Your problems start at SCORE1 and SCORE2 you have the pointer control specified incorrectly. Also notice that 1OO is not 100. This file can be read easily with list input and missover infile statement option.
DATA READ;
INFILE DATALINES FIRSTOBS=2 missover;
informat id $3. gender $1. dob mmddyy10.;
input ID GENDER DOB SCORE1 SCORE2;
format dob mmddyy10.;
datalines;
***Header line: ID GENDER DOB SCORE1 SCORE2
001 M 10/10/1976 1OO 99
002 F 01/01/1960 89
003 M 05/07/2001 90 98
;;;;
run;

Flatten Multiple Observations in SAS

I have a data set where a patient can have multiple (and unknown) values for some variables that ends up looking something like this:
ID Var1 Var2 Var3 Var4
1 Blue Female 17 908
1 Blue Female 17 909
1 Red Female 17 910
1 Red Female 17 911
...
99 Blue Female 14 908
100 Red Male 28 911
I want to pack this data down so that each ID has only a single entry, with indicators for the presence or absence of one of the values in their original slew of entries. So, for example, something like this:
ID YesBlue Var2 Var3 Yes911
1 1 Female 17 1
99 1 Female 14 0
100 0 Male 28 1
Is there a straightforward way to do this in SAS? Or failing that, in Access (where the data is coming from) which I have no idea really how to use.
If your data set is called PATIENTS1, maybe something like this:
proc sql noprint;
create table patients2 as
select *
,case(var1)
when "Blue" then 1
else 0
end as ablue
,case(var4)
when 911 then 1
else 0
end as a911
,max(calculated ablue) as yesblue
,max(calculated a911) as yes911
from patients1
group by id
order by id;
quit;
proc sort data=patients2 out=patients3(drop=var1 var4 ablue a911) nodupkey;
by id;
run;
Here's a data step solution. I'm assuming that the values for Var2 and Var3 are always the same for a given ID.
data have;
input ID Var1 $ Var2 $ Var3 Var4;
cards;
1 Blue Female 17 908
1 Blue Female 17 909
1 Red Female 17 910
1 Red Female 17 911
99 Blue Female 14 908
100 Red Male 28 911
;
run;
data want (drop=Var1 Var4 _:);
set have;
by ID;
if first.ID then do;
_blue=0;
_911=0;
end;
_blue+(Var1='Blue');
_911+(Var4=911);
if last.ID then do;
YesBlue=(_blue>0);
Yes911=(_911>0);
output;
end;
run;
EDIT: Looks like the same thing Keith said, only written differently.
This should do it:
data test;
input id Var1 $ Var2 $ Var3 Var4;
datalines;
1 Blue Female 17 908
1 Blue Female 17 909
1 Red Female 17 910
1 Red Female 17 911
99 Blue Female 14 908
100 Red Male 28 911
run;
data flatten(drop=Var1 Var4);
set test;
retain YesBlue;
retain Yes911;
by id;
if first.id then do;
YesBlue = 0;
Yes911 = 0;
end;
if Var1 eq "Blue" then YesBlue = 1;
if Var4 eq 911 then Yes911 = 1;
if last.id then output;
run;
PROC SQL is perfect for things like this. This a similar to DavB's answer, but eliminates the additional sort:
data have;
input ID Var1 $ Var2 $ Var3 Var4;
cards;
1 Blue Female 17 908
1 Blue Female 17 909
1 Red Female 17 910
1 Red Female 17 911
99 Blue Female 14 908
100 Red Male 28 911
;
run;
proc sql;
create table want as
select ID
, max(case(var1)
when 'Blue'
then 1
else 0 end) as YesBlue
, max(var2) as Var2
, max(var3) as Var3
, max(case(var4)
when 911
then 1
else 0 end) as Yes911
from have
group by id
order by id;
quit;
It also safely reduces your original data by the ID variable, but at the risk of possible errors if the source is not exactly as you describe.