Parse file name with SAS - sas

I have a directory in which every week there is a new file created. names are like below:
file_w1.csv
file_w2.csv
file_w3.csv
What I need to do is pick up the latest file (based on modified date), then parse the 2 characters just before the file extension.
So in this case, I want 'w3' because I want to use this to know which week I am reporting for.
How can I do this in SAS?

An operating system independent technique would use SAS External File functions such as dopen, fopen and finfo to obtain information about a folder and it's items.
Consider this sample code that does a 'full dump' of available information whilst parsing C:\Temp on a Windows machine:
data _null_;
length dfileref fileref $8 folder $200;
rc = filename (dfileref, 'C:\Temp');
did = dopen(dfileref);
if did then do;
do index = 1 to doptnum(did);
featurename = doptname(did,index);
featurevalue = dinfo(did,featurename);
put index= featurename= featurevalue=;
if featurename = 'Directory' then folder = featurevalue;
end;
do dindex = 1 to dnum(did);
entryname = dread(did,dindex);
put dindex= entryname=;
rc = filename(fileref, cats(folder, '/', entryname));
fid = fopen (fileref); * if entry is another folder fid will be 0;
if fid then do;
do findex = 1 to foptnum(fid);
featurename = foptname(fid, findex);
featurevalue = finfo(fid, featurename);
put +2 findex= featurename= featurevalue=;
end;
fid = fclose(fid);
end;
rc = filename(fileref);
end;
did = dclose(did);
end;
rc = filename (dfileref);
run;
After examining the log you can pare down the code needed to gather specific desired information into a data set. You can then use SQL queries to further act upon the data:
data csv_files(keep=fullname lastmod where=(fullname like '%.csv'));
length dfileref fileref $8 folder $200;
folder = 'C:\Temp';
rc = filename (dfileref, folder);
did = dopen(dfileref);
if did then do;
do dindex = 1 to dnum(did);
entryname = dread(did,dindex);
rc = filename(fileref, cats(folder, '/', entryname));
fid = fopen (fileref);
if fid then do;
fullname = finfo(fid,'Filename');
lastmod = input(finfo(fid,'Last Modified'), datetime18.); format lastmod datetime18.;
output;
fid = fclose(fid);
end;
rc = filename(fileref);
end;
did = dclose(did);
end;
rc = filename (dfileref);
run;
proc sql;
create table csv_newest as
select *, scan(scan(fullname,-1,'_'),1,'.') as tag
from csv_files
where prxmatch ('/_.+\.csv$/', fullname)
having lastmod = max(lastmod)
;

Related

SAS Append datasets only if they exist

I have many datasets for each month with the same name, changing just the end with specific month so for instance my datasets that i am calling with this code:
TEMPCAAD.LIFT_&NOME_MODELO._&VERSAO_MODELO._'!! put(cur_month,yymmn6.));
are called "TEMPCAAD.LIFT_MODEL_V1_202021", "TEMPCAAD.LIFT_MODEL_V1_202022" and so on...
I am trying to append all datasets but some of them doesn't exist, so when i run the following code I get the error
Dataset "TEMPCAAD.LIFT_MODEL_V1_202022" does not exist.
%let currentmonth = &anomes_scores;
%let previousyearmonth = &anomes_x12;
data _null_;
length string $1000;
cur_month = input("&previousyearmonth.01",yymmdd8.);
do until (cur_month > input("&currentmonth.01",yymmdd8.));
string = catx(' ',trim(string),'TEMPCAAD.LIFT_&NOME_MODELO._&VERSAO_MODELO._'!! put(cur_month,yymmn6.));
cur_month = intnx('month',cur_month,1,'b');
end;
call symput('mydatasets',trim(string));
%put &mydatasets;
run;
data WORK.LIFTS_U6M;
set &mydatasets.;
run;
How can I append only existing datasets?
Instead of looping on every file to see whether it exist or not, why don't you just extract all the dataset names from dictionary.tables?
libname TEMPCAAD "/home/kermit/TEMPCAAD";
data tempcaad.lift_model_v1_202110 tempcaad.lift_model_v1_202111 tempcaad.lift_model_v1_202112;
id = 1;
output tempcaad.lift_model_v1_202110;
id = 2;
output tempcaad.lift_model_v1_202111;
id = 3;
output tempcaad.lift_model_v1_202112;
run;
%let nome_modelo = MODEL;
%let versao_modelo = V1;
proc sql;
select strip("TEMPCAAD."||memname) into :dataset separated by " "
from dictionary.tables
where libname="TEMPCAAD" and memname like "LIFT_&NOME_MODELO._&VERSAO_MODELO.%";
quit;
data want;
set &dataset.;
run;
You can easily tweak the where statement to only extract the data that you wish to append. Just remember to put double quotes if you specify a macro-variable in it.

Deletion of 3 year old files using sas

I am trying to delete the files which are having 3 year old files from a folder. But when I am running the code it is also deleting the other files which are not in the file name frmat which I tried to delete. The file name is like SFRE_BIL_SIT_20160812_134317_PAM_FILES1.zip I attached the code also with this
options mlogic;
%macro delete_year_files_in_folder(folder);
filename filelist "&folder";
data _null_;
dir_id = dopen('filelist');
total_members = dnum(dir_id);
do i = 1 to total_members;
member_name = dread(dir_id,i);
datestring = scan(member_name,4,'_');
month = input(substr(datestring,5,2),best.);
day = input(substr(datestring,5,2),best.);
year = input(substr(datestring,1,4),best.);
date = mdy(month, day, year);
if intnx('year', today(),-3,'S') > date %put _all_;
then do;
file_id = mopen(dir_id,member_name,'i',0);
if file_id > 0 then do;
freadrc = fread(file_id);
rc = fclose(file_id);
rc = filename('delete',member_name,,,'filelist');
rc = fdelete('delete');
end; %put _all_;
rc = fclose(file_id);
end;
end;
rc = dclose(dir_id);
run;
%mend;
I can see at least one bug in your code that might be causing unexpected behaviour:
month = input(substr(datestring,5,2),best.);
day = input(substr(datestring,5,2),best.);
I think you meant to type:
day = input(substr(datestring,7,2),best.);
I wouldn't do this, though - it's quicker to use date informats to do this:
date = input(datestring,yymmdd8.);
However, I think the bigger problem is with this line:
if intnx('year', today(),-3,'S') > date then do; /*Deletion logic follows*/
If you have a file that you don't want to delete that isn't in the same format, it's likely that date will have a missing value, as it won't have a date in the place where you're looking and the input functions earlier on will return missing values. In SAS, numeric missing values are less than any non-missing numeric value, so this condition will evaluate to true except for files with names in the format that you want to delete that are less than 3 years old.
You can avoid missing values fairly easily by tweaking your code like so:
if intnx('year', today(),-3,'S') > date and not(missing(date)) then do;

SAS: attempting to build a loop for uploading multiple files

I'm attempting to build a loop in SAS to upload several files, and am running into a few issues to work through. Current code:
%Macro Weatherupload(File=, output=);
proc import datafile = &File;
out = &output;
dbms=dlm replace;
delimiter= ",";
getnames=yes;
guessingrows = 1000;
run;
%Mend Weatherupload;
%Macro WeatherPrepare(input=, output=);
data &output (keep=Wban_Number _YearMonthDay DewPoint Temp _Avg_Dew_Pt _Avg_Temp year month day);
set &input;
DewPoint = Input(compress(_Avg_Dew_Pt,"*"), 3.);
Temp = Input(compress(_Avg_Temp,"*"), 3.);
year = (_yearmonthday - mod(_yearmonthday, 10000))/10000;
month = ((_yearmonthday - mod(_yearmonthday, 100)) - (_yearmonthday - mod(_yearmonthday,10000)))/100;
day = mod(_yearmonthday, 100);
drop _Avg_Dew_Pt _Avg_Temp _YearMonthDay;
run;
%Mend WeatherPrepare;
data temperatures;
do i = 1999 to 2015;
do j = 1 to 12;
name = 'C:\Users\DILLON.SAXE\Documents\'||i||j||'.tar'||' \'||i||j||'daily.txt';
output = i||j||'weather';
final = i||j||'final';
%Weatherupload(File=name, output=output)
%WeatherPrepare(input=output, output=final)
end;
end;
run;
The goal is to run through several files, in several folders, listed in month + day + rest of title, and (at the moment) upload two variables of data from them. Later I will want to add in merging the files, and doing some more data work, but for the moment it's the macro issues and uploading that are holding it up.
Is there a way to either use proc upload in a loop, or use another data step in the loop?
I get the error "more positional variables than (something)" (I forget exact error, but it lists positional variables). I've tried adding and removing commas in the macros, but have not been able to get rid of this error. Any ideas?
I don't think you can call macro's like you have in your data step. I think you're intending to use Call Execute.
data temperatures;
do i = 1999 to 2015;
do j = 1 to 12;
name = 'C:\Users\DILLON.SAXE\Documents\'||i||j||'.tar'||' \'||i||j||'daily.txt';
output = i||j||'weather';
final = i||j||'final';
call execute('%Weatherupload(File='||name||', output='||output||')');
call execute('%WeatherPrepare(input='||output||', output='||final||')');
end;
end;
run;
Alternatively, assuming you're trying to read all files in a folder, I think you should be creating a list of file names in a data set, use a data step with the filename option to input all files at once instead. Here's a brief method on how to do it if all where in a single folder: https://communities.sas.com/docs/DOC-10426
Here is a page that has code to get a list of files into a data set
http://www.sascommunity.org/wiki/Making_Lists
since your macros have neither conditionals (%if) nor loops (%do)
then I suggest you use them as parameterized %incudes
Here is a tool to read the list-of-files data set and call a program
http://www.sascommunity.org/wiki/Call_Execute_Parameterized_Include
note: in proc import always set guessingrows to the max value;
in v9.3 that is 2147483647;
Got it sorted out, based on the first answer. Eventual code:
%Macro Weatherupload(File=, output=);
proc import datafile = "&File"
out = &output
dbms=dlm replace;
delimiter= ",";
getnames=yes;
guessingrows = 1000;
run;
%Mend Weatherupload;
%Macro WeatherPrepare(input=, output=);
data &output;
set &input;
DewPoint = Input(compress(_Avg_Dew_Pt,"*"), 3.);
Temp = Input(compress(_Avg_Temp,"*"), 3.);
year = (_yearmonthday - mod(_yearmonthday, 10000))/10000;
month = ((_yearmonthday - mod(_yearmonthday, 100)) - (_yearmonthday - mod(_yearmonthday,10000)))/100;
day = mod(_yearmonthday, 100);
keep Wban_Number DewPoint Temp year month day;
run;
%Mend WeatherPrepare;
%Macro WeatherPrepare2(input=, output=);
data &output;
set &input;
DewPoint = Input(DewPoint, 3.);
Temp = Input(compress(_Avg_Temp,"*"), 3.);
year = (_yearmonthday - mod(_yearmonthday, 10000))/10000;
month = ((_yearmonthday - mod(_yearmonthday, 100)) - (_yearmonthday - mod(_yearmonthday,10000)))/100;
day = mod(_yearmonthday, 100);
Wban_Number = Wban;
keep Wban_Number DewPoint Temp year month day;
run;
%Mend WeatherPrepare;
%Macro Append(merge=);
data temperatures;
set temperatures &merge;
%Mend Append;
data temperatures;
do i = 1999 to 2015;
do j = 1 to 12;
jzero = put(j, z2.);
name = compress('C:\Users\DILLON.SAXE\Documents\'||i||jzero||'.tar'||'\'||i||jzero||'daily.txt');
name2 = compress('C:\Users\DILLON.SAXE\Documents\'||'QCLCD'||i||jzero||'\'||i||jzero||'daily.txt');
output = compress('weather'||i||j);
final = compress('final'||i||j);
if 1000*i+j < 200708 then
do;
call execute('%Weatherupload(File='||name||', output='||output||')');
call execute('%WeatherPrepare(input='||output||', output='||final||')');
end;
else
do;
call execute('%Weatherupload(File='||name2||', output='||output||')');
call execute('%WeatherPrepare2(input='||output||', output='||final||')');
end;
call execute('%Append(merge='||final||')');
end;
end;
drop i j jzero name name2 output final;
run;

Reading the file folder on UNIX using SAS

I am trying to read the folder with zip files using Pipe Command. But I get error saying ls command not recognized. There are actually 2 zip files(ABC_*.zip) in the folder /PROD/
Can anybody help me in this?
%let extl_dir=/PROD/ ;
filename zl pipe "ls &extl_dir.ABC_*.zip";
data ziplist_a;
infile zl end=last;
length path $200 zipnm $50 filedt $15;
input path $;
zipnm=scan(path,-1,"/");
filedt=scan(scan(path,-1,"_"),1,".");
call symput('zip'||left(_n_), zipnm);
call symput('path'||left(_n_), path);
call symput('filedt'||left(_n_),filedt);
if last then call symput('num_zip',_n_);
*call symput('flenm',filenm);
run;
SAS has published a convenient macro to list files within a directory that does not rely upon running external commands. It can be found here. I prefer this approach as it does not introduce external sources of possible error such as user permissions, pipe permissions etc.
The macro uses datastep functions (through %sysfunc) and the commands can be called in the same manner from a datastep. Below is an example which extracts tile information.
%let dir = /some/folder;
%let fType = csv;
data want (drop = _:);
_rc = filename("dRef", "&dir.");
_id = dopen("dRef");
_n = dnum(_id);
do _i = 1 to _n;
name = dread(_id, _i);
if upcase(scan(name, -1, ".")) = upcase("&fType.") then do;
_rc = filename("fRef", "&dir./" || strip(name));
_fid = fopen("fRef");
size = finfo(_fid, "File Size (bytes)");
dateCreate = finfo(_fid, "Create Time");
dateModify = finfo(_fid, "Last Modified");
_rc = fclose(_fid);
output;
end;
end;
_rc = dclose(_id);
run;

Reading text file in SAS with delimiter in wrong places

I am reading a .txt file into SAS, that uses "|" as the delimiter. The issue is there is one column that is using "|" as a word separator as well instead of acting like delimiter, this needs to be in one column.
For example the txt file looks like:
apple|fruit|Healthy|choices|of|food|12|2012|chart
needs to look like this in the SAS dataset:
apple | fruit | Healthy choices of Food | 12 | 2012 | chart
How do I eliminate "|" between "Healthy choices of Food"?
I think this will do what you want:
data tmp1;
length tmp $100;
input tmp $;
cards;
apple|fruit|Healthy|choices|of|food|12|2012|chart
apple|fruit|Healthy|choices|of|food|and|lots|of|other|stuff|12|2012|chart
;
run;
data tmp2;
set tmp1;
num_delims=length(tmp)-length(compress(tmp,"|"));
expected_delims=5;
extra_delims=num_delims-expected_delims;
length new_var $100;
i=1;
do while(scan(tmp,i,"|") ne "");
if i<=2 or (extra_delims+2)<i<=num_delims then new_var=trim(new_var)||scan(tmp,i,"|")||"|";
else new_var=trim(new_var)||scan(tmp,i,"|")||"#";
i+1;
end;
new_var=left(tranwrd(new_var,"#"," "));
run;
This isn't particularly elegant, but it will work:
data tmp;
input tmp $50.;
cards;
apple|fruit|Healthy|choices|of|food|12|2012|chart
;
run;
data tmp;
set tmp;
var1 = scan(tmp,1,'|');
var2 = scan(tmp,2,'|');
var4 = scan(tmp,-3,'|');
var5 = scan(tmp,-2,'|');
var6 = scan(tmp,-1,'|');
var3 = tranwrd(tmp,trim(var1)||"|"||trim(var2),"");
var3 = tranwrd(var3,trim(var4)||"|"||trim(var5)||"|"||trim(var6),"");
var3 = tranwrd(var3,"|"," ");
run;
Expanding a little on Itzy's answer, here is another possible solution:
data want;
/* Define variables */
attrib item length=$10 label='Item';
attrib class length=$10 label='Family';
attrib desc length=$80 label='Item Description';
attrib count length=8 label='Some number';
attrib year length=$4 label='Year';
attrib somevar length=$10 label='Some variable';
length countc $8; /* A temp variable */
infile 'c:\temp\delimited_temp.txt' lrecl=1000 truncover;
input;
item = scan(_infile_,1,'|','mo');
class = scan(_infile_,2,'|','mo');
countc = scan(_infile_,-3,'|','mo'); /* Temp var for numeric field */
count = inputn(countc,'8.'); /* Re-read the numeric field */
year = scan(_infile_,-2,'|','mo');
somevar = scan(_infile_,-1,'|','mo');
desc = tranwrd(
substr(_infile_
,length(item)+length(class)+3
,length(_infile_)
- ( length(item)+length(class)+length(countc)
+length(year)+length(somevar)+5))
,'|',' ');
drop countc;
run;
The key in this case it to read your file directly and handle the delimiters yourself. This can be tricky and requires that your data file is exactly as described. A much better solution would be to go back to whoever gave this this data and ask them to deliver it to you in a more appropriate form. Good luck!
Another possible workaround.
data tmp;
infile '/path/to/textfile';
input tmp :$100.;
array varlst (*) $30 v1-v6;
a=countw(tmp,'|');
do i=1 to dim(varlst);
if i<=2 then
varlst(i) = scan(tmp,i,'|');
else if i>=4 then
varlst(i) = scan(tmp,a-(dim(varlst)-i),'|');
else do j=3 to a-(dim(varlst)-i)-1;
varlst(i)=catx(' ', varlst(i),scan(tmp,j,'|'));
end;
end;
drop tmp a i j;
run;