I would like to add a day to a SAS date and save that value as a new variable. I have these data:
data have;
input ID $ date;
datalines;
A 14610
B 13229
C 15644
D 14278
;
run;
And I would to end up with these data, with the new variables labelled for each iteration as below:
data want;
input ID $ date date1 date2 date3;
datalines;
A 14610 14611 14612 14613
B 13229 13230 13231 13232
C 15644 15645 15646 15647
D 14278 14279 14280 14281
;
run;
How can I accomplish this?
Try this
data have;
input ID $ date;
datalines;
A 14610
B 13229
C 15644
D 14278
;
run;
data want;
set have;
array d date1 - date3;
do over d;
d = date + _i_;
end;
run;
Related
How do i stat a count function for the last 3 columns in my dataset, putting into consideration that the name of the last 3 columns always changes
Data test;
Set test1;
Count=count(coulmn12,column13,column14);
Run;
You could also use an ARRAY. And the old if 0 then set.
data have;
retain id x1-x5 z1-z6 . z7-z10 . a ' ' z11-z12 . ;
id+1; z10 = 1; output;
id+1; z11 = 3.14159; output;
id+1; z12 = 42; output;
format _numeric_ 4.;
run;
data want;
if 0 then set have(drop=id /*or other numeric vars as needed*/);
array _v[*] _numeric_;
set have;
nmiss_3 = nmiss(_v[dim(_v)],_v[dim(_v)-1],_v[dim(_v)-2]);
run;
data want; /*move ID and NMISS_3 back to left*/
if 0 then set want(keep=id nmiss_3);
set want;
run;
proc print;
run;
You can query a data set's metadata to get the names of the last three columns.
data have;
retain id x1-x15 z1-z12 . ;
id+1; z10 = 1; output;
id+1; z11 = 3.14159; output;
id+1; z12 = 42; output;
format _numeric_ 4.;
run;
* get data sets metadata;
proc contents noprint data=have out=have_metadata;
run;
* query for names of last 3 columns;
proc sql noprint;
select name
into :last_three_columns separated by ','
from have_metadata
having varnum > max(varnum) - 3
;
%put NOTE: &=last_three_columns;
data want;
attrib last3_nmiss_count length=8;
set have;
last3_nmiss_count = nmiss(&last_three_columns);
run;
dm 'viewtable
want(keep=last3_nmiss_count id z:)';
I'm looking for report using SAS data step :
I have a data set:
Name Company Date
X A 199802
X A 199705
X D 199901
y B 200405
y F 200309
Z C 200503
Z C 200408
Z C 200404
Z C 200309
Z C 200210
Z M 200109
W G 200010
Report I'm looking for:
Name Company From To
X A 1997/05 1998/02
D 1998/02 1999/01
Y B 2003/09 2004/05
F 2003/09 2003/09
Z C 2002/10 2005/03
M 2001/09 2001/09
W G 2000/10 2000/10
THANK you,
Tried using proc print but it is not accurate. So looking for a data null solution.
data _null_;
set salesdata;
by name company date;
array x(*) from;
From=lag(date);
if first.name then count=1;
do i=count to dim(x);
x(i)=.;
end;
count+1;
If first.company then do;
from_date1=date;
end;
if last.company then To_date=date;
if from_date1 ="" and to_date="" then delete;
run;
data _null_;
set yourEvents;
by Name Company notsorted;
file print;
If _N_ EQ 1 then put
#01 'Name'
#06 'Company'
#14 'From'
#22 'To'
;
if first.Name then put
#01 Name
#; ** This instructs sas to not start a new line for the next put instruction **;
retain From To;
if first.company then do;
From = 1E9;
To = 0;
end;
if Date LT From then From = Date;
if Date GT To then To = Date;
if last.Company then put
#06 Company
#14 From yymm7.
#22 To yymm7.
;
run;
I have done data step to calculate From_date and To_date
and then proc report to print the report by group.
proc sort data=have ;
by Name Company Date;
run;
data want(drop=prev_date date);
set have;
by Name Company date;
attrib From_Date To_date format=yymms10.;
retain prev_date;
if first.Company then prev_date=date;
if last.Company then do;
To_date=Date;
From_Date=prev_date;
end;
if not(last.company) then delete;
run;
proc sort data=want;
by descending name ;
run;
proc report data=want;
define Name/order order=data;
run;
IMHO, the simplest way is exploiting proc report and its analysis column type as the code below. Note that name and company columns are automatically sorted in alphabetical order (as most of the summary functions or procedures do).
/* your data */
data have;
infile datalines;
input Name $ Company $ Date $;
cards;
X A 199802
X A 199705
X D 199901
y B 200405
y F 200309
Z C 200503
Z C 200408
Z C 200404
Z C 200309
Z C 200210
Z M 200109
W G 200010
;
run;
/* convert YYYYMM to date */
data have2(keep=name company date);
set have(rename=(date=date_txt));
name = upcase(name);
y = input(substr(date_txt, 1, 4), 4.);
m = input(substr(date_txt, 5, 2), 2.);
date = mdy(m,1,y);
format date yymms7.;
run;
/****** 1. proc report ******/
proc report data=have2;
columns name company date=date_from date=date_to;
define name / 'Name' group;
define company / 'Company' group;
define date_from / 'From' analysis min;
define date_to / 'To' analysis max;
run;
The html output:
(tested on SAS 9.4 win7 x64)
============================ OFFTOPIC ==============================
One may also consider using proc means or proc tabulate. The basic code forms are shown below. However, you can also see that further adjustments in output formats are required.
/***** 2. proc tabulate *****/
proc tabulate data=have2;
class name company;
var date;
table name*company, date=' '*(min='From' max='To')*format=yymms7.;
run;
proc tabulate output:
/***** 3. proc means (not quite there) *****/
* proc means + ODS -> cannot recognize date formats;
proc means data=have2 nonobs min max;
class name company;
format date yymms7.; * in vain;
var date;
run;
proc means output (cannot output date format, dunno why):
You may leave comments on improving these alternative ways.
If you have multiple datasets (hundreds) with the same variable names and would like to merge them by a key, is there a simple way to control which value of a variable to take for the variables that are not the key? One way to do this would be a rename on the merge statement then write another step to use those renamed variable to calculate the most frequent value with an array...but I'm really wondering if there's an built in way of handling this. For example:
data ds1;
infile datalines dsd delimiter=' ';
input var1 $ var2;
datalines;
a 1
b 2
;
run;
data ds2;
infile datalines dsd delimiter=' ';
input var1 $ var2;
datalines;
a
b 2
;
run;
data ds3;
infile datalines dsd delimiter=' ';
input var1 $ var2;
datalines;
a 1
b
;
run;
data ds123;
merge ds1 ds2 ds3;
by var1;
run;
This code will 'pick' the 'furthest right' var2 i.e. the dataset ds123:
a 1
b
But I may want it to be:
a 1
b 2
as this would match the most frequent values.
Use an SQL join and the coalesce function. Specify the preference order in the coalesce and the first non-missing in that order will be used.
proc sql noprint;
create table ds123 as
select a.var1,
coalesce(a.var2,b.var2,c.var2) as var2
from ds1 as a,
ds2 as b,
ds3 as c
where a.var1 = b.var1
and b.var1 = c.var1;
quit;
Does anyone know how to change a date variable from Date9 to MMDDYY10 format in SAS9.3? I've tried using the put and input functions, but the result is null
Formats are nothing but instructions on how to display a value. Dates are numeric represented as the number of days from 1JAN1960.
data x;
format formated1 date9. formated2 mmddyy10.;
noformated = "01JAN1960"d;
formated1 = noformated;
formated2 = noformated;
run;
proc print data=x;
run;
Obs formated1 formated2 noformated
1 01JAN1960 01/01/1960 0
In short, just change the format on the dataset and the date will be displayed with the new format.
Try both functions:
tmpdate = put(olddate,DATE9.);
newdate = input(tmpdate,MMDDYY10.);
Or maybe even
newdate = input(put(olddate,DATE9.),MMDDYY10.);
For changing the format of variable in a table - PROC SQL or PROC DATASETS:
data WORK.TABLE1;
format DATE1 DATE2 date9.;
DATE1 = today();
DATE2 = DATE1;
run;
proc contents;
run;
proc datasets lib=WORK nodetails nolist;
modify TABLE1;
format DATE1 mmddyy10.;
quit;
proc sql;
alter table WORK.TABLE1
modify DATE2 format=mmddyy10.
;
quit;
proc contents;
run;
I have two SAS data sets. The first is relatively small, and contains unique dates and a corresponding ID:
date dateID
1jan90 10
2jan90 15
3jan90 20
...
The second data set very large, and has two date variables:
dt1 dt2
1jan90 2jan90
3jan90 1jan90
...
I need to match both dt1 and dt2 to dateID, so the output would be:
id1 id2
10 15
20 10
Efficiency is very important here. I know how to use a hash object to do one match, so I could do one data step to do the match for dt1 and then another step for dt2, but I'd like to do both in one data step. How can this be done?
Here's how I would do the match for just dt1:
data tbl3;
if 0 then set tbl1 tbl2;
if _n_=1 then do;
declare hash dts(dataset:'work.tbl2');
dts.DefineKey('date');
dts.DefineData('dateid');
dts.DefineDone();
end;
set tbl1;
if dts.find(key:date)=0 then output;
run;
A format would probably work just as efficiently given the size of your hash table...
data fmt ;
retain fmtname 'DTID' type 'N' ;
set tbl1 ;
start = date ;
label = dateid ;
run ;
proc format cntlin=fmt ; run ;
data tbl3 ;
set tbl2 ;
id1 = put(dt1,DTID.) ;
id2 = put(dt2,DTID.) ;
run ;
Edited version based on below comments...
data fmt ;
retain fmtname 'DTID' type 'I' ;
set tbl1 end=eof ;
start = date ;
label = dateid ;
output ;
if eof then do ;
hlo = 'O' ;
label = . ;
output ;
end ;
run ;
proc format cntlin=fmt ; run ;
data tbl3 ;
set tbl2 ;
id1 = input(dt1,DTID.) ;
id2 = input(dt2,DTID.) ;
run ;
I don't have SAS in front of me right now to test it but the code would look like this:
data tbl3;
if 0 then set tbl1 tbl2;
if _n_=1 then do;
declare hash dts(dataset:'work.tbl2');
dts.DefineKey('date');
dts.DefineData('dateid');
dts.DefineDone();
end;
set tbl1;
date = dt1;
if dts.find()=0 then do;
id1 = dateId;
end;
date = dt2;
if dts.find()=0 then do;
id2 = dateId;
end;
if dt1 or dt2 then do output; * KEEP ONLY RECORDS THAT MATCHED AT LEAST ONE;
drop date dateId;
run;
I agree with the format solution, for one, but if you want to do the hash solution, here it goes. The basic thing here is that you define the key as the variable you're matching, not in the hash itself.
data tbl2;
informat date DATE7.;
input date dateID;
datalines;
01jan90 10
02jan90 15
03jan90 20
;;;;
run;
data tbl1;
informat dt1 dt2 DATE7.;
input dt1 dt2;
datalines;
01jan90 02jan90
03jan90 01jan90
;;;;
run;
data tbl3;
if 0 then set tbl1 tbl2;
if _n_=1 then do;
declare hash dts(dataset:'work.tbl2');
dts.DefineKey('date');
dts.DefineData('dateid');
dts.DefineDone();
end;
set tbl1;
rc1 = dts.find(key:dt1);
if rc1=0 then id1=dateID;
rc2 = dts.find(key:dt2);
if rc2=0 then id2=dateID;
if rc1=0 and rc2=0 then output;
run;