tranwrd doesn't replace correctly and truncate target string - sas

I'm replacing the target string which is '01 -DIC-17 ' with '01 \ 12 \ 17', then convert it to sas date with the input function, but output is '01 \ 12 \ '. truncate the '\ 17'.
The DATA SAS is:
data test;
input issueDate $10. lastDate $10.;
datalines;
05-DIC-16 04-ENE-17
05-DIC-16 04-ENE-17
05-DIC-16 04-ENE-17
05-DIC-16 20-ENE-17
05-DIC-16 20-ENE-17
05-DIC-16 04-ENE-17
05-DIC-16 20-ENE-17
05-DIC-16 04-ENE-17
05-DIC-16 20-ENE-17
;
run;
I have an adjust month table:
DATA ADJUST_MONTH;
INPUT TARGET $ REPLACEMENT $;
DATALINES;
-ENE- /01/
-FEB- /02/
-MAR- /03/
-ABR- /04/
-MAY- /05/
-JUN- /06/
-JUL- /07/
-AGO- /08/
-SEP- /09/
-OCT- /10/
-NOV- /11/
-DIC- /12/
;
RUN;
Run the code:
proc sql;
update test
set issueDate = TRANWRD(issueDate,substr(issueDate,3,5),(select replacement from adjust_month where target eq substr(issueDate,3,5)));quit;
The output: IssueDate: '05-DIC-16' convert en IssueDate: '05/12/'
Thanks for help me.

Your replacement column contains trailing blanks. Trim or similar functions will help you. For example, replace the nested select by trim((select replacement from adjust_month where target eq substr(issueDate,3,5))),

There's no need to do all the transformations in your code. You can use the relevant language date informat to read the data directly. In your case it is in Spanish, so the espdfde9. informat will read the source data as a date (using the input function), then you can do with it as you please.
I would create extra columns to store these dates, rather than update the exisitng columns, for the reason that dates are generally better stored as numbers instead of character. But it's your choice.
To use this informat, your SAS installation has to have European character sets and encoding loaded.
data test;
input issueDate $10. lastDate $10.;
datalines;
05-DIC-16 04-ENE-17
05-DIC-16 04-ENE-17
05-DIC-16 04-ENE-17
05-DIC-16 20-ENE-17
05-DIC-16 20-ENE-17
05-DIC-16 04-ENE-17
05-DIC-16 20-ENE-17
05-DIC-16 04-ENE-17
05-DIC-16 20-ENE-17
;
run;
data want;
set test;
issuedate_new = input(issuedate, espdfde9.);
lastdate_new = input(lastdate, espdfde9.);
format issuedate_new lastdate_new date9.;
run;

Related

Unable to change data format

I need to convert dates in the following forma:
30-giu-18
30-nov-20
......
into:
30JUN2018
30NOV2020
.......
I tried:
data Test;
set input;
mydates = input(myolddates, ddmmyy10.;)
format mydates ddmmyy10.;
run;
It doesn't work. The variable myolddates is character $9.
Can anyone help me please?
Try this
data have;
input myolddates $9.;
datalines;
30-giu-18
30-nov-20
;
options dflang = Italian ;
data want;
set have;
date = input(myolddates, EURDFDE9.);
format date ddmmyy10.;
run;

specifying data informat using do loops in sas

I have a large data file with data in the following format: country, datatype, year1month1 to year2018month7.
Reading the data using proc import did not work for all data fields. I ended up modifying the SAS datastep code to ensure data format was correct.
However, I am having trouble simplifying the code, namely I would like a do loop to go through all the years and month. This way, I could use current date to figure out the range of dates for the file and the code to create Year/Month variable does not have to repeat 100 times in the file.
data test;
infile 'abc.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat Country_Name $34. ;
do i = 1940 to 2018;
do j = 1 to 12;
informat _(i)M(j) best32.;
end;
end;
informat Base_Year $1. ;
format Country_Name $34. ;
do i = 1940 to 2018;
do j = 1 to 12;
format _(i)M(j) best12.;
end;
end;
format Base_Year $1. ;
input
Country_Name $
do i = 1940 to 2018;
do j = 1 to 12;
_(i)M(j) $;
end;
end;
Base_Year $;
run;
There are a few approaches here that could work. The most directly translatable to your approach is to use the macro language.
You need to translate those two loops to something like this:
%do i = 1940 %to 2018;
%do j = 1 %to 12;
informat _&i.M&j. best32.;
%end;
%end;
Notice the % there. This also has to be in a macro; you can't do this in normal datastep code.
I would rewrite it to use a macro like so:
%macro make_ym(startyear=, endyear=, separator=);
%local i j;
%do i = &startyear. %to &endyear.;
%do j = 1 %to 12;
_&i.&separator.&j.
%end;
%end;
%mend make_ym;
data test;
infile 'abc.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat Country_Name $34. ;
informat %make_ym(startyear=1940,endyear=2018,separator=M) best32.;
informat Base_Year $1. ;
format %make_ym(startyear=1940,endyear=2018,separator=M) best12.;
format Base_Year $1. ;
input
Country_Name $
%make_ym(startyear=1940,endyear=2018,separator=M)
Base_Year $;
run;
I took out the $ after the yMm bits in the input since you declared them as numeric.
Don't model your data step after the code generated by PROC IMPORT. It does a lot of useless things, like attaching formats and informats to variables that don't need them.
For your problem you just need a simple program like this:
data test;
infile 'abc.csv' dsd dlm= ',' truncover firstobs=2 ;
input Country_Name :$34. Y1940M01 .... Y2018M08 Base_Year :$1. ;
run;
Now the only tricky part is building that list of numerical variables. If the list is small enough you could just put it into a macro variable. Fortunately that is not a problem in this case since using 8 character names (YyyyyMmm) there is room for over 300 years worth in a data step character variable. A variable of length 10,800 bytes should have room for 100 years of month names.
So just run this data step first.
data _null_;
length names $10800 ;
basedate = mdy(1,1,1940);
lastdate = today();
do i=0 to intck('month',basedate,lastdate);
date=intnx('month',basedate,i);
names=catx(' ',names,cats('Y',year(date),'M',put(month(date),Z2.)));
end;
call symputx('names',names);
run;
Now you can use the macro variable in your INPUT statement.
data test;
infile 'abc.csv' dsd dlm= ',' truncover firstobs=2 ;
input Country_Name :$34. &names Base_Year :$1. ;
run;

SAS: using first. and last. to process a date range

I am trying to go through a list of dates and keep only the date range for dates that 5 or more occurrences and delete all others. The example I have is:
data test;
input dt dt2;
format dt dt2 date9.;
datalines;
20000 20001
20000 20002
20000 20003
21000 21001
21000 21002
21000 21003
21000 21004
21000 21005
;
run;
proc sort data = test;
by dt dt2;
run;
data check;
set test;
by dt dt2;
format dt dt2 date9.;
if last.dt = first.dt then
if abs(last.dt2 - first.dt) < 5 then delete;
run;
What I want returned is just one entry, if possible, but I would be happy with the entire appropriate range as well.
The one entry would be a table that has:
start_dt end_dt
21000 21005
The appropriate range is:
21000 21001
21000 21002
21000 21003
21000 21004
21000 21005
My code doesn't work as desired, and I am not sure what changes I need to make.
last.dt2 and first.dt are flags and can have value in (0,1), so condition abs(last.dt2 - first.dt) < 5 is always true.
Use counter variable to count records in group instead:
data check(drop= count);
length count 8;
count=0;
do until(last.dt);
set test;
by dt dt2;
format dt dt2 date9.;
count = count+1;
if last.dt and count>=5 then output;
end;
run;
I'm not sure why you are looking to use the last.dt2 and the first.dt within your delete function so I have turned it around to create your desired output:
data check2;
set test;
by dt ;
format dt dt2 date9.;
if last.dt then do;
if abs(dt2 - dt) >= 5 then output;
end;
run;
Of course, this will only work if your file is sorted on dt and dt2.
Hope this helps.

Report using data _Null_

I'm looking for report using SAS data step :
I have a data set:
Name Company Date
X A 199802
X A 199705
X D 199901
y B 200405
y F 200309
Z C 200503
Z C 200408
Z C 200404
Z C 200309
Z C 200210
Z M 200109
W G 200010
Report I'm looking for:
Name Company From To
X A 1997/05 1998/02
D 1998/02 1999/01
Y B 2003/09 2004/05
F 2003/09 2003/09
Z C 2002/10 2005/03
M 2001/09 2001/09
W G 2000/10 2000/10
THANK you,
Tried using proc print but it is not accurate. So looking for a data null solution.
data _null_;
set salesdata;
by name company date;
array x(*) from;
From=lag(date);
if first.name then count=1;
do i=count to dim(x);
x(i)=.;
end;
count+1;
If first.company then do;
from_date1=date;
end;
if last.company then To_date=date;
if from_date1 ="" and to_date="" then delete;
run;
data _null_;
set yourEvents;
by Name Company notsorted;
file print;
If _N_ EQ 1 then put
#01 'Name'
#06 'Company'
#14 'From'
#22 'To'
;
if first.Name then put
#01 Name
#; ** This instructs sas to not start a new line for the next put instruction **;
retain From To;
if first.company then do;
From = 1E9;
To = 0;
end;
if Date LT From then From = Date;
if Date GT To then To = Date;
if last.Company then put
#06 Company
#14 From yymm7.
#22 To yymm7.
;
run;
I have done data step to calculate From_date and To_date
and then proc report to print the report by group.
proc sort data=have ;
by Name Company Date;
run;
data want(drop=prev_date date);
set have;
by Name Company date;
attrib From_Date To_date format=yymms10.;
retain prev_date;
if first.Company then prev_date=date;
if last.Company then do;
To_date=Date;
From_Date=prev_date;
end;
if not(last.company) then delete;
run;
proc sort data=want;
by descending name ;
run;
proc report data=want;
define Name/order order=data;
run;
IMHO, the simplest way is exploiting proc report and its analysis column type as the code below. Note that name and company columns are automatically sorted in alphabetical order (as most of the summary functions or procedures do).
/* your data */
data have;
infile datalines;
input Name $ Company $ Date $;
cards;
X A 199802
X A 199705
X D 199901
y B 200405
y F 200309
Z C 200503
Z C 200408
Z C 200404
Z C 200309
Z C 200210
Z M 200109
W G 200010
;
run;
/* convert YYYYMM to date */
data have2(keep=name company date);
set have(rename=(date=date_txt));
name = upcase(name);
y = input(substr(date_txt, 1, 4), 4.);
m = input(substr(date_txt, 5, 2), 2.);
date = mdy(m,1,y);
format date yymms7.;
run;
/****** 1. proc report ******/
proc report data=have2;
columns name company date=date_from date=date_to;
define name / 'Name' group;
define company / 'Company' group;
define date_from / 'From' analysis min;
define date_to / 'To' analysis max;
run;
The html output:
(tested on SAS 9.4 win7 x64)
============================ OFFTOPIC ==============================
One may also consider using proc means or proc tabulate. The basic code forms are shown below. However, you can also see that further adjustments in output formats are required.
/***** 2. proc tabulate *****/
proc tabulate data=have2;
class name company;
var date;
table name*company, date=' '*(min='From' max='To')*format=yymms7.;
run;
proc tabulate output:
/***** 3. proc means (not quite there) *****/
* proc means + ODS -> cannot recognize date formats;
proc means data=have2 nonobs min max;
class name company;
format date yymms7.; * in vain;
var date;
run;
proc means output (cannot output date format, dunno why):
You may leave comments on improving these alternative ways.

Changing date format in SAS9.3

Does anyone know how to change a date variable from Date9 to MMDDYY10 format in SAS9.3? I've tried using the put and input functions, but the result is null
Formats are nothing but instructions on how to display a value. Dates are numeric represented as the number of days from 1JAN1960.
data x;
format formated1 date9. formated2 mmddyy10.;
noformated = "01JAN1960"d;
formated1 = noformated;
formated2 = noformated;
run;
proc print data=x;
run;
Obs formated1 formated2 noformated
1 01JAN1960 01/01/1960 0
In short, just change the format on the dataset and the date will be displayed with the new format.
Try both functions:
tmpdate = put(olddate,DATE9.);
newdate = input(tmpdate,MMDDYY10.);
Or maybe even
newdate = input(put(olddate,DATE9.),MMDDYY10.);
For changing the format of variable in a table - PROC SQL or PROC DATASETS:
data WORK.TABLE1;
format DATE1 DATE2 date9.;
DATE1 = today();
DATE2 = DATE1;
run;
proc contents;
run;
proc datasets lib=WORK nodetails nolist;
modify TABLE1;
format DATE1 mmddyy10.;
quit;
proc sql;
alter table WORK.TABLE1
modify DATE2 format=mmddyy10.
;
quit;
proc contents;
run;