Reading and extracting files from a directory using SAS - sas

I have a directory /run/return/files/archives/prep/share/ that contains both .txt and .csv files.
For example IA_PROD.txt and retour_PROD.csv
I want to read both types of files and extract only their names (IA_PROD and retour_PROD) to store in an excel file named FILE_NAMES.xlsx. I have the code below that extracts .txt and .csv files though two separate data sets (file_list1 and file_list2) and I finally concatenate the two data sets to export in an excel sheet. I wanted to be able to optimise my code to make it one single data step where I read both csv, txt and extract both of them together.
Thanks for your generous help
%let REP_BLOCTEL_ALLER = /run/return/files/archives/prep/share/;
filename result pipe "ls &rep_bloctel_aller./*txt";
filename result2 pipe "ls &rep_bloctel_aller./*csv";
data file_list1;
infile result lrecl=200 truncover;
input rep $120.;
file_name = tranwrd(substr(rep, length("&rep_bloctel_aller./")+1),'.txt','');
call symput(compress('txt_'!!put(_n_,2.)),file_name);
call symput('n_obs',put(_n_,2.));
run;
data file_list2;
infile result2 lrecl=200 truncover;
input rep $120.;
file_name = tranwrd(substr(rep, length("&rep_bloctel_aller./")+1),'.csv','');
call symput(compress('csv_'!!put(_n_,2.)),file_name);
call symput('n_obs',put(_n_,2.));
run;
DATA file_list;
SET file_list1 file_list2;
RUN;
proc export data = file_list
(keep = file_name)
outfile="&rep_bloctel_aller./FILE_NAME_BLOCTEL.xlsx"
dbms=xlsx
replace;
sheet= "FILE_NAME_BLOCTEL";
run;

Not sure if it answers your question, but I think this answers your problem :)
What I propose it to filter your input files with ls :
%let REP_BLOCTEL_ALLER = /run/return/files/archives/prep/share/;
filename result pipe "ls &file_path. | egrep -i '\.csv$|\.txt$'";
data file_list;
infile result lrecl=200 truncover;
input filename $120.;
run;
I don't think we can make it shorter!
Some explanations:
ls lists your directory
the pipe | forwards the result of ls to the egrep command
egrpe is used to search for text value in the result sent by ls
the -i option indicates to egrep that text lookup must be case insensitive (will detect TXT, txt, TxT files and so on)
The '.csv$|.txt$' indicates to search for either '.csv' or '.txt' at the end of a line ($), which corresponds to the file the extension

If the goal is to just generate the list into an XLSX file then you just need:
libname out xlsx "&rep_bloctel_aller./FILE_NAME_BLOCTEL.xlsx";
data out.FILE_NAME_BLOCTEL;
infile "cd &rep_bloctel_aller.; ls *.txt *.csv" pipe truncover;
input file_name $256.;
run;

Related

how does pipe read all information of a page using SAS?

I have a folder which has some tables. Once I opened it, it shows table name, date modified, type and size.
I am trying to read all the information including: table name, date modified, type and size using SAS. so I tried pipe first:
filename tbl pipe "dir /abc/sales";
data new;
infile tbl pad;
input all $500.;
run;
the result only has the table name, but no date modified, type and size.
so just wonder how to fix it.
An example folder 'sales' below:
table name size date modified type
sales1 490k 10/28/2020 9:32:50 am sas7bdat
sales2 85k 11/12/2020 4:28:23 pm sas7bdat
sales3 307k 12/17/2020 1:55:09 pm sas7bdat
From your path it looks like SAS is running on Unix. Not sure what the command dir does on your flavor of Unix, but ls -l should get the file details on any flavor of Unix.
data new;
infile "ls -l /abc/sales/" pipe truncover ;
input all $500.;
run;

how to read files from a folder that were created before a date

I am trying to use SAS to read multiple files from a directory and they were created before a date.
I have used this code to help me to read all the files. It works perfectly. Now I found out that only some files that were created before a certain date are what I need. I think that could be done either by FILENAME PIPE Dir options or by INFILE statement options, but I cannot find the answers.
code source:
http://support.sas.com/kb/41/880.html
filename DIRLIST pipe 'dir "C:\_today\file*.csv" /b ';
data dirlist ;
infile dirlist lrecl=200 truncover;
input file_name $100.;
run;
data _null_;
set dirlist end=end;
count+1;
call symputx('read'||put(count,4.-l),cats('c:\_today\',file_name));
call symputx('dset'||put(count,4.-l),scan(file_name,1,'.'));
if end then call symputx('max',count);
run;
options mprint symbolgen;
%macro readin;
%do i=1 %to &max;
data &&dset&i;
infile "&&read&i" lrecl=1000 truncover dsd;
input var1 $ var2 $ var3 $;
run;
%end;
%mend readin;
%readin;
Currently you are reading in just the file names using the dir command. The existing /b modifier is saying print just the file name and nothing else. You want to change it to read both the file name and the CREATED date of the file. In order to do that it gets a little messy. You will need to change that pipe command from:
filename DIRLIST pipe 'dir "C:\_today\file*.csv" /b ';
...to this... :
filename DIRLIST pipe 'dir "C:\_today\file*.csv" /tc ';
The output will change from something like this:
file1.csv
file2.csv
...
...to something like this... :
Volume in drive C has no label.
Volume Serial Number is 90ED-A122
Directory of C:\_today
01/13/2017 09:14 AM 1,991 file1.csv
01/11/2017 11:43 AM 169 file2.csv
...
...
...
01/11/2017 11:43 AM 169 file99.csv
99 File(s) 6,449 bytes
0 Dir(s) 57,999,806,464 bytes free
So you will then need to modify your data step that creates dirlist to clean up the results returned by the new dir statement. You will need to ignore the header and footer and read in the date and time etc. Once you have that date and time in the appropriate SAS format, you can then just use a SAS where clause to keep the rows you are interested in. I will leave this as an exercise for you to do. If you have trouble with it you can always open a new question.
If you need more information on the dir command, you can open up a command prompt (Start Menu->Run->"cmd"), and then type in dir /? to see a list of available switches for the dir command. You may find a slightly different combination of switches for it that better suits your task than what I listed above.
You can use powershell to leverage the features of the operating system.
filename get_them pipe
" powershell -command
""
dir c:\temp
| where {$_.LastWriteTime -gt '3/19/2019'}
| select -property name
| ft -hidetableheader
""
";
data _null_;
infile get_them;
input;
putlog _infile_;
run;

How to get user properties like created by in sas

How to fetch user details for a .sas file or file properties for all files stored in a directory? I am trying to get all possible attributes like: modified date, modified by, created by, for a macro.
data dir_meta(drop=rc file_ref fid);
%let directory_ref = %sysfunc(filename(dirref,&dir));
%let dir_id=%sysfunc(dopen(&dirref));
if &dir_id eq 0 then do;
put _error_=1;
return;
end;
%let _count=%sysfunc(dnum(&dir_id);
do i=1 to &_count;
%let dir_name = %sysfunc(dread(&dir_id,&i);
if upcase(scan(&dir_name,-1,.)) = upcase(&extn) then do;
put &dir\&dir_name;
file_ref='temp';
file_name=%sysfunc( filename(file_ref,"&dir\&&dir_name"));
fid=%sysfunc(fopen(file_ref));
create_date=%sysfunc(finfo(&fid,Create Time));
Modified_date=%sysfunc(finfo(&fid,Last Modified));
output;
rc=fclose(fid);
end;
end;
%let rc_dir=%sysfunc(dclose(dir_id);
run;
Sweta,
Presuming you are using SAS in a recent version of Windows and the session has X command allowed, then you can pipe the results of a powershell command to a data step to read in what ever information you want.
In powershell use this command to see the kinds of information about a file that can be selected
PS > DIR | GET-MEMBER
Once you decide on the members to select a data step can read the powershell output. For example:
filename fileinfo pipe 'powershell -command "dir | select Fullname, Length, #{E={$_.LastWriteTime.ToString(''yyyy-MM-ddTHH:mm:ss.ffffffzzz'')}} | convertTo-csv"';
* powershell datetime formatting tips: https://technet.microsoft.com/en-us/library/ee692801.aspx?f=255&MSPPError=-2147217396;
data mydata;
infile fileinfo missover firstobs=4 dsd dlm=',';
attrib
filename length=$250
size length=8 format=comma12.
lastwrite length=8 format=datetime20. informat=E8601DZ32.6
;
input filename size lastwrite;
run;

filevar infile statement to read in multiple txt file

from this link i learnt how to read multiple txt file.
Problem: is it possible to create a macro variable to input all txt file in a folders. say C:\Users\Desktop\ (given all files are in txt format with name datasetyyyymmdd.)
I have dataset20150101.txt - dataset20150806.txt and i do not want to manually input all those linkage in the datalines.
data whole2;
infile datalines;
length fil2read $256;
input fil2read $;
infile dummy filevar=fil2read end=done dsd;
do while (not done);
input name$ value1 value2;
output;
end;
datalines;
C:\Users\Desktop\dataset20150501.txt
C:\Users\Desktop\dataset20150502.txt
run;
Ask the operating system which files are present:
filename DataIn pipe "dir C:\Users\Desktop\dataset*.txt /S /B";
data whole2;
infile DataIn truncover;
length fil2read $256;
input fil2read $;
infile dummy filevar=fil2read end=done dsd;
do while (not done);
input name$ value1 value2;
output;
end;
run;
The Bare option /B removes unneeded information like last access date.
I added the Sub-folder option /S because then the dir statement returns full path names. This way it also reads dataset*.txt files in subfolder of C:\Users\Desktop\. If that does not suite you, remove the /S and use
path2Read = "dir C:\Users\Desktop\"||fil2read;

Read in Files with pattern match into one SAS dataset

I would like to read a number of .csv files into a single SAS dataset using a pattern match. For example if in the directory /home/datasets there are 5 files:
/home/datasets
~/output_group1a.csv
~/output_group1b.csv
~/output_group1c.csv
~/output_group2a.csv
~/output_group2b.csv
All with known and identical structures and data types. I would like to read in only those files corresponding to group 1 without having to explicitly specify the filenames.
You can use a wildcard in your infile statement. If you have headers in each file you'll need to account for that. Here's a bit more of an example.
https://gist.github.com/statgeek/4c27ea9a7ed6d3528835
data try01;
length filename txt_file_name $256;
retain txt_file_name;
infile "Path\*.txt" eov=eov filename=filename truncover;
input#;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -2, ".\");
eov=0;
end;
else input
*Place input code here;
;
run;