Reading the file folder on UNIX using SAS - sas

I am trying to read the folder with zip files using Pipe Command. But I get error saying ls command not recognized. There are actually 2 zip files(ABC_*.zip) in the folder /PROD/
Can anybody help me in this?
%let extl_dir=/PROD/ ;
filename zl pipe "ls &extl_dir.ABC_*.zip";
data ziplist_a;
infile zl end=last;
length path $200 zipnm $50 filedt $15;
input path $;
zipnm=scan(path,-1,"/");
filedt=scan(scan(path,-1,"_"),1,".");
call symput('zip'||left(_n_), zipnm);
call symput('path'||left(_n_), path);
call symput('filedt'||left(_n_),filedt);
if last then call symput('num_zip',_n_);
*call symput('flenm',filenm);
run;

SAS has published a convenient macro to list files within a directory that does not rely upon running external commands. It can be found here. I prefer this approach as it does not introduce external sources of possible error such as user permissions, pipe permissions etc.
The macro uses datastep functions (through %sysfunc) and the commands can be called in the same manner from a datastep. Below is an example which extracts tile information.
%let dir = /some/folder;
%let fType = csv;
data want (drop = _:);
_rc = filename("dRef", "&dir.");
_id = dopen("dRef");
_n = dnum(_id);
do _i = 1 to _n;
name = dread(_id, _i);
if upcase(scan(name, -1, ".")) = upcase("&fType.") then do;
_rc = filename("fRef", "&dir./" || strip(name));
_fid = fopen("fRef");
size = finfo(_fid, "File Size (bytes)");
dateCreate = finfo(_fid, "Create Time");
dateModify = finfo(_fid, "Last Modified");
_rc = fclose(_fid);
output;
end;
end;
_rc = dclose(_id);
run;

Related

SAS Append datasets only if they exist

I have many datasets for each month with the same name, changing just the end with specific month so for instance my datasets that i am calling with this code:
TEMPCAAD.LIFT_&NOME_MODELO._&VERSAO_MODELO._'!! put(cur_month,yymmn6.));
are called "TEMPCAAD.LIFT_MODEL_V1_202021", "TEMPCAAD.LIFT_MODEL_V1_202022" and so on...
I am trying to append all datasets but some of them doesn't exist, so when i run the following code I get the error
Dataset "TEMPCAAD.LIFT_MODEL_V1_202022" does not exist.
%let currentmonth = &anomes_scores;
%let previousyearmonth = &anomes_x12;
data _null_;
length string $1000;
cur_month = input("&previousyearmonth.01",yymmdd8.);
do until (cur_month > input("&currentmonth.01",yymmdd8.));
string = catx(' ',trim(string),'TEMPCAAD.LIFT_&NOME_MODELO._&VERSAO_MODELO._'!! put(cur_month,yymmn6.));
cur_month = intnx('month',cur_month,1,'b');
end;
call symput('mydatasets',trim(string));
%put &mydatasets;
run;
data WORK.LIFTS_U6M;
set &mydatasets.;
run;
How can I append only existing datasets?
Instead of looping on every file to see whether it exist or not, why don't you just extract all the dataset names from dictionary.tables?
libname TEMPCAAD "/home/kermit/TEMPCAAD";
data tempcaad.lift_model_v1_202110 tempcaad.lift_model_v1_202111 tempcaad.lift_model_v1_202112;
id = 1;
output tempcaad.lift_model_v1_202110;
id = 2;
output tempcaad.lift_model_v1_202111;
id = 3;
output tempcaad.lift_model_v1_202112;
run;
%let nome_modelo = MODEL;
%let versao_modelo = V1;
proc sql;
select strip("TEMPCAAD."||memname) into :dataset separated by " "
from dictionary.tables
where libname="TEMPCAAD" and memname like "LIFT_&NOME_MODELO._&VERSAO_MODELO.%";
quit;
data want;
set &dataset.;
run;
You can easily tweak the where statement to only extract the data that you wish to append. Just remember to put double quotes if you specify a macro-variable in it.

Spit out data sources, and libraries used in opened program

Is there a way to extract all the data sources and libraries used in an opened program in SAS EG?
I found the following which I am able to manipulate, but unfortunately I am unable to use Filename because of how sas is set up, and I am unable to use Filename FTP. Any ideas here?
Filename inp "<path and name of the sas code/>";
data temp;
infile inp;
input rec $100.;
if index(lowcase(rec), "libname") ne 0 then output; /*check libraries */
else if index(lowcase(rec),"filename") ne 0 then output; /*check input files*/
else if index(lowcase(rec),"data") ne 0 and index(rec,".") ne 0 then output; /* to check the
permanent datasets */
else if index(lowcase(rec),"merge") ne 0 and index(rec,".") ne o then output; /*check permanent
datasets used in merge*/
run;

Parse file name with SAS

I have a directory in which every week there is a new file created. names are like below:
file_w1.csv
file_w2.csv
file_w3.csv
What I need to do is pick up the latest file (based on modified date), then parse the 2 characters just before the file extension.
So in this case, I want 'w3' because I want to use this to know which week I am reporting for.
How can I do this in SAS?
An operating system independent technique would use SAS External File functions such as dopen, fopen and finfo to obtain information about a folder and it's items.
Consider this sample code that does a 'full dump' of available information whilst parsing C:\Temp on a Windows machine:
data _null_;
length dfileref fileref $8 folder $200;
rc = filename (dfileref, 'C:\Temp');
did = dopen(dfileref);
if did then do;
do index = 1 to doptnum(did);
featurename = doptname(did,index);
featurevalue = dinfo(did,featurename);
put index= featurename= featurevalue=;
if featurename = 'Directory' then folder = featurevalue;
end;
do dindex = 1 to dnum(did);
entryname = dread(did,dindex);
put dindex= entryname=;
rc = filename(fileref, cats(folder, '/', entryname));
fid = fopen (fileref); * if entry is another folder fid will be 0;
if fid then do;
do findex = 1 to foptnum(fid);
featurename = foptname(fid, findex);
featurevalue = finfo(fid, featurename);
put +2 findex= featurename= featurevalue=;
end;
fid = fclose(fid);
end;
rc = filename(fileref);
end;
did = dclose(did);
end;
rc = filename (dfileref);
run;
After examining the log you can pare down the code needed to gather specific desired information into a data set. You can then use SQL queries to further act upon the data:
data csv_files(keep=fullname lastmod where=(fullname like '%.csv'));
length dfileref fileref $8 folder $200;
folder = 'C:\Temp';
rc = filename (dfileref, folder);
did = dopen(dfileref);
if did then do;
do dindex = 1 to dnum(did);
entryname = dread(did,dindex);
rc = filename(fileref, cats(folder, '/', entryname));
fid = fopen (fileref);
if fid then do;
fullname = finfo(fid,'Filename');
lastmod = input(finfo(fid,'Last Modified'), datetime18.); format lastmod datetime18.;
output;
fid = fclose(fid);
end;
rc = filename(fileref);
end;
did = dclose(did);
end;
rc = filename (dfileref);
run;
proc sql;
create table csv_newest as
select *, scan(scan(fullname,-1,'_'),1,'.') as tag
from csv_files
where prxmatch ('/_.+\.csv$/', fullname)
having lastmod = max(lastmod)
;

How to import multiple .dbf files in SAS

%let dirname = C:\Users\data;
filename DIRLIST pipe 'dir/B &dirname\*.dbf';
/* Create a data set with one observation for each file name */
data dirlist;
length fname $8.;
infile dirlist length=reclen;
input fname $8.;
run;
data all_text (drop=fname);
set dirlist;
filepath = "&dirname\"||fname||".dbf";
infile dummy filevar = filepath length=reclen end=done missover;
do while(not done);
INPUT
F1 : 2.
F2 : 2.
F3 : 2.
F4 : 10.
F5 : 4.;
output;
end;
run;
The problem is that it is only reading the first line of each files and not the whole file before moving to the next. Also variable F1 are shown as missing.
Suggestions are welcome
So a standard proc import would be:
proc import out=sample1 datafile="path to dbf file.dbf" dbms=DBF replace;
run;
The problem now, is how to generate this set of code for every file in your file list. Using the CALL EXECUTE statement from #Tom is your best bet. You call also create a small macro and call it for each filename, using CALL EXECUTE. If you're new to SAS this can be easier to understand.
*Create a macro that imports the DBF
%macro import_dbf(input= , output=);
proc import out=&out datafile="&output" dbms=DBF replace;
run;
%mend;
Then call macro from dataset. I'm naming the datasets DBF001, DBF0002 etc.
%let dirname=C:\_localdata;
data dirlist;
informat fname $20.;
input fname;
cards;
data1.dbf
data2.dbf
data3.dbf
data4.dbf
;
run;
data out;
set dirlist;
str=catt('%import_dbf(input="', "&dirname", '\', fname, '", output=dbf',
put(_n_, z4.), ');');
run;
proc print data=out;
run;
Import them one by one and then combine them.
%let dirname = C:\Users\data;
data filelist ;
infile "dir /b &dirname\*.dbf" pipe truncover end=eof;
fileno + 1;
input fname $256. ;
tempname = 'temp'||put(fileno,z4.);
call execute(catx(' ','proc import replace dbms=dbf'
,'out=',tempname,'datafile=',quote(trim(fname)),';run;'
));
if eof then call symputx('lastname',tempname);
run;
data want ;
set temp0001-&lastname;
run;

Deletion of 3 year old files using sas

I am trying to delete the files which are having 3 year old files from a folder. But when I am running the code it is also deleting the other files which are not in the file name frmat which I tried to delete. The file name is like SFRE_BIL_SIT_20160812_134317_PAM_FILES1.zip I attached the code also with this
options mlogic;
%macro delete_year_files_in_folder(folder);
filename filelist "&folder";
data _null_;
dir_id = dopen('filelist');
total_members = dnum(dir_id);
do i = 1 to total_members;
member_name = dread(dir_id,i);
datestring = scan(member_name,4,'_');
month = input(substr(datestring,5,2),best.);
day = input(substr(datestring,5,2),best.);
year = input(substr(datestring,1,4),best.);
date = mdy(month, day, year);
if intnx('year', today(),-3,'S') > date %put _all_;
then do;
file_id = mopen(dir_id,member_name,'i',0);
if file_id > 0 then do;
freadrc = fread(file_id);
rc = fclose(file_id);
rc = filename('delete',member_name,,,'filelist');
rc = fdelete('delete');
end; %put _all_;
rc = fclose(file_id);
end;
end;
rc = dclose(dir_id);
run;
%mend;
I can see at least one bug in your code that might be causing unexpected behaviour:
month = input(substr(datestring,5,2),best.);
day = input(substr(datestring,5,2),best.);
I think you meant to type:
day = input(substr(datestring,7,2),best.);
I wouldn't do this, though - it's quicker to use date informats to do this:
date = input(datestring,yymmdd8.);
However, I think the bigger problem is with this line:
if intnx('year', today(),-3,'S') > date then do; /*Deletion logic follows*/
If you have a file that you don't want to delete that isn't in the same format, it's likely that date will have a missing value, as it won't have a date in the place where you're looking and the input functions earlier on will return missing values. In SAS, numeric missing values are less than any non-missing numeric value, so this condition will evaluate to true except for files with names in the format that you want to delete that are less than 3 years old.
You can avoid missing values fairly easily by tweaking your code like so:
if intnx('year', today(),-3,'S') > date and not(missing(date)) then do;