SAS: reading dates with different formats - sas

In SAS, how to read the following dates with different formats? (especially 01/05/2018 and 1/6/2018)
01/05/2018
1/6/2018
Jan 05 2018
Jan 6 2018
Any help is greatly appreciated. Thanks!

The ANYDTDTM informat will parse most varieties of human readable date, time or datetime representations into a SAS datetime value. The datepart function of that value will return the SAS date value thereof.
The ANYDTDTE informat will also parse a variety of date, time or datetime representations and return the date part implicitly. However it fails on some of your data items where ANYDTDTM does not.
data _null_;
input
#1 a_datetime_value anydtdtm.
#1 a_date_value anydtdte.
;
hot_date = datepart(a_datetime_value);
put
'_infile_ ' _infile_
/ 'anydtdtm. ' a_datetime_value datetime16.
/ 'datepart() ' hot_date yymmdd10.
/ 'anydtdte. ' a_date_value yymmdd10.
/;
datalines;
01/05/2018
1/6/2018
Jan 05 2018
Jan 6 2018
run;
==== LOG ====
_infile_ 01/05/2018
anydtdtm. 05JAN18:00:00:00
datepart() 2018-01-05
anydtdte. .
_infile_ 1/6/2018
anydtdtm. 06JAN18:00:00:00
datepart() 2018-01-06
anydtdte. 2018-01-06
_infile_ Jan 05 2018
anydtdtm. 05JAN18:00:00:00
datepart() 2018-01-05
anydtdte. .
_infile_ Jan 6 2018
anydtdtm. 06JAN18:00:00:00
datepart() 2018-01-06
anydtdte. .
Read the SAS documentation and conference papers for a greater exploration of the ANYDT** family of informats.

Related

How to remove duplicates in SAS data

I am trying to delete the observations in my data set that are the same across multiple variables.
For example
PIN Start Date End Date
1 Jan 1 2014 Jan 3 2014>
1 Jan 1 2014 Jan 3 2015
3 March 2 2014 March 5 2014
4 July 1 2014 July 8 2014
5 July 1 2014 July 8 2014
6 August 9 2014 August 24 2014
I would want to remove those with the same PIN and Start Date.
Translate the string dates into SAS dates first.
data have2;
set have(rename=(start_date = _start_date
end_date = _end_date) );
start_date = input(strip(_start_date), anydtdte10.);
end_date = input(strip(_end_date), anydtdte10.);
format start_date end_date date9.;
drop _start_date _end_date;
run;
Then use proc sort nodupkey.
proc sort data=have2 nodupkey;
by pin start_date;
run;

SAS transpose wide format to long format

I have a SAS dataset that I need to transpose from wide format to long format
data that I have:
DATES Year1 Year2 Year3
Jan 100 200 300
Data I want:
DATES Year Income
Jan 1 100
Jan 2 200
Jan 3 300
In this scenario the syntax for proc transpose is fairly simple.
proc transpose data=have out=want(rename=(_name_=Year col1=Income));
by date;
var year:; * the ':' is a wildcard character;
run;
The resulting output:
Obs date Year Income
1 Jan year1 100
2 Jan year2 200
3 Jan year3 300

How to convert character variable to numeric date variable

I have below variable
mhstdtc
-----------
2011-01-01
2015-02-01
2002
2001
2003-03
2003-12
Here is my code I used to convert the variable
ASTDTMC=INPUT(MHSTDTC,is8601da.);
PUT ASTDTMC DATE9.;
It worked only the variable has yyyy-mm-dd values, remaining were returned blank. Please help me to convert yyyy and yyyy-mm values also;
Thanks in advance.
One way using SUBSTR on the left.
25 data _null_;
26 input iso :$10.;
27 mask = '....-06-15';
28 substr(mask,1,length(iso))=iso;
29 ASTDTM=INPUT(mask,is8601da.);
30 format astdtm date9.;
31 put 'NOTE: ' (_all_)(=);
32 cards;
NOTE: iso=2011-01-01 mask=2011-01-01 ASTDTM=01JAN2011
NOTE: iso=2015-02-01 mask=2015-02-01 ASTDTM=01FEB2015
NOTE: iso=2002 mask=2002-06-15 ASTDTM=15JUN2002
NOTE: iso=2001 mask=2001-06-15 ASTDTM=15JUN2001
NOTE: iso=2003-03 mask=2003-03-15 ASTDTM=15MAR2003
NOTE: iso=2003-12 mask=2003-12-15 ASTDTM=15DEC2003
If you have your original dates in character format, add "01-01" to each date, then do the conversion. If your dates are not in character format, convert them to character then add "01-01" to them, and try this code :
data have ;
input mhstdtc $10.;
cards;
2011-01-01
2015-02-01
2002
2001
2003-03
2003-12
;
data want;
set have;
ASTDTMC2=compress(catx("",MHSTDTC,"-01-01")," ","");
ASTDTMC3=inPUT(ASTDTMC2,yymmdd10.);
run;
The result is in ASTDTMC3.
18628
20120
15341
14976
15765
16040

How to delete the row with missing character values using SAS

I have a data set like this:
id type time
70657 23E Nov 4 2002 12:00AM
61651 12R
11603 DQ2
45819 Jul 23 2013 12:00AM
732 Mar 4 2011 12:00AM
22810 231
I want to do two things with missing values.
The first thing is how to remove the rows if the values of the variable time is " ".
desired output1
id type time
70657 23E Nov 4 2002 12:00AM
45819 Jul 23 2013 12:00AM
732 Mar 4 2011 12:00AM
The second thing is to remove the rows if there is any missing values.
desired output2
id type time
70657 23E Nov 4 2002 12:00AM
SAS code:
data character;
length id type time $ 24;
input id $ 1-5 type $ 8-10 time $ 13-31;
cards;
70657 23E Nov 4 2002 12:00AM
61651 12R
11603 DQ2
45819 Jul 23 2013 12:00AM
732 Mar 4 2011 12:00AM
22810 231
;
run;
I would be inclined to use proc sql. Something like:
proc sql;
create table newchar as
select *
from character
where id is not null and type is not null and time is not null;
quit;
The SAS alternative.
DATA WANT;
SET CHARACTER (WHERE = (TIME ~= "" AND TYPE ~= "" AND ID ~= ""));
RUN;

Find matches by condition between 2 datasets in SAS

I'm trying to improve the processing time used via an already existing for-loop in a *.jsl file my classmates and I are using in our programming course using SAS. My question: is there a PROC or sequence of statements that exist that SAS offers that can replicate a search and match condition? Or a way to go through unsorted files without going line by line looking for matching condition(s)?
Our current scrip file is below:
if( roadNumber_Fuel[n]==roadNumber_TO[m] &
fuelDate[n]>=tripStart[m] & fuelDate[n]<=TripEnd[m],
newtripID[n] = tripID[m];
);
I have 2 sets of data simplified below.
DATA1:
ID1 Date1
1 May 1, 2012
2 Jun 4, 2013
3 Aug 5, 2013
..
.
&
DATA2:
ID2 Date2 Date3 TRIP_ID
1 Jan 1 2012 Feb 1 2012 9876
2 Sep 5 2013 Nov 3 2013 931
1 Dec 1 2012 Dec 3 2012 236
3 Mar 9 2013 May 3 2013 390
2 Jun 1 2013 Jun 9 2013 811
1 Apr 1 2012 May 5 2012 76
...
..
.
I need to check a lot of iterations but my goal is to have the code
check:
Data1.ID1 = Data2.ID2 AND (Date1 >Date2 and Date1 < Date3)
My desired output dataset woudld be
ID1 Date1 TRIP_ID
1 May 1, 2012 76
2 Jun 4, 2013 811
Thanks for any insight!
You can do range matches in two ways. First off, you can match using PROC SQL if you're familiar with SQL:
proc sql;
create tableC as
select * from table A
left join table B
on A.id=B.id and A.date > B.date1 and A.date < B.date2
;
quit;
Second, you can create a format. This is usually the faster option if it's possible to do this. This is tricky when you have IDs, but you can do it.
First, create a new variable, ID+date. Dates are numbers around 18,000-20,000, so multiply your ID by 100,000 and you're safe.
Second, create a dataset from the range dataset where START=lower date plus id*100,000, END=higher date + id*100,000, FMTNAME=some string that will become the format name (must start with A-Z or _ and have A-Z, _, digits only). LABEL is the value you want to retrieve (Trip_ID in the above example).
data b_fmts;
set b;
start=id*100000+date1;
end =id*100000+date2;
label=value_you_want_out;
fmtname='MYDATEF';
run;
Then use PROC FORMAT with CNTLIN=` option to import formats.
proc format cntlin=b_fmts;
quit;
Make sure your date ranges don't overlap - if they do this will fail.
Then you can use it easily:
data a_match;
set a;
trip_id=put(id*100000+date,MYDATEF.);
run;