I would like to iterate through files in a particular folder, and extract substrings of their files names. Is there an easy way to do this?
lib dir '.../folder';
#iterate through all files of dir and extract first five letters of file name;
#open files and do some processesing, aka data steps proc steps;
Firstly, I'd point out that "dir" appears to be a misspelt libref, not a folder. If you are looking for files in a folder, you can use:
%macro get_filenames(location);
filename _dir_ "%bquote(&location.)";
data filenames(keep=fname);
handle=dopen( '_dir_' );
if handle > 0 then do;
count=dnum(handle);
do i=1 to count;
fname=subpad(dread(handle,i),1,5);/* extract first five letters */
output filenames;
end;
end;
rc=dclose(handle);
run;
filename _dir_ clear;
%mend;
%get_filenames("c:\temp\");
If you are looking for datasets in a library, you can use:
proc sql;
create table datasets as
select substr(memname,1,5) as dataset
from dictionary.tables
where libname='LIB'; /* must be uppercase */
either approach will produce a dataset of 'files' which can be subsequently 'stepped through'..
Related
I have the daily trading data (TAQ data) for a month. I am trying to unzip each of them.
The folder's name is EQY_US_ALL_TRADE_202107.
It has several zipped (GZ files) files for each trading day named as
EQY_US_ALL_TRADE_202210701
EQY_US_ALL_TRADE_202210702
EQY_US_ALL_TRADE_202210703 ...
EQY_US_ALL_TRADE_202210729
I want to create a SAS dataset for each individual day and save them to a file. As far as I understand, I need a do loop to go through a month of daily TAQ data and calculate the trade duration and then just save the relevant data to a file so that each saved data set would be small, and then I have to aggregate them all up. For calculating trade duration, I am just taking the difference of the "DATETIME" variable, (ex. dif(datetime))
Until now, I have been working by making one working directory (D:\MainDataset) and doing calculations in it starting with unzipping files. But it is taking too much time and disk space. I need to create separate datasets for each trading day and save it to a file.
data "D:\MainDataset" (keep= filename time exchange symbol saleCondition tradeVolume tradePrice tradeStopStock
tradeCorrection sequenceNumber tradeId sourceOfTrade tradeReportingFacility
participantTimeStamp tradeReportingFacilityTimeStamp);
length folderef $8 time $15. exchange $1. symbol $17. saleCondition $4. tradeStopStock $1.
sourceOfTrade $1. tradeReportingFacility $1.
participantTimeStamp $15. tradeReportingFacilityTimestamp $15.;
rc=filename(folderef,"D:\EQY_US_ALL_TRADE_202107");
did = dopen(folderef);
putlog did=;
/* do k = 1 to dnum(did); Use this to run the loop over all files in the folder */
do k = 1 to 3;
filename = dread(did,k);
putlog filename=;
if scan(filename,-1,'.') ne 'gz' then continue;
fullname = pathname(folderef) || '\' || filename;
putlog fullname=;
do while(1);
infile archive zip filevar=fullname gzip dlm='|' firstobs=2 obs=5000000 dsd truncover eof=nextfile;
input time exchange symbol saleCondition tradeVolume tradePrice tradeStopStock
tradeCorrection sequenceNumber tradeId sourceOfTrade tradeReportingFacility
participantTimeStamp tradeReportingFacilityTimeStamp;
output;
end;
nextfile:
end;
stop;
run;
Proc contents data = "D:\MainDataset";
run;
proc print data ="D:\MainDataset" (obs = 110);
run;
Create code to process one file. Probably coded as a macro that takes as input the name of the file to read and the name of the dataset to create.
%macro taq(infile,dataset);
data &dataset;
infile "&infile" zip gzip dsd dlm='|' truncover firstobs=2;
....
run;
%mend taq;
Then generate a dataset with the names of the files to read and the dataset names you want to create from them. So perhaps something like this:
%let ym=202107;
%let folder=D:\EQY_US_ALL_TRADE_&ym;
data taq_files;
length dataset $32 filename $256 ;
keep dataset filename;
rc=filename('folder',"&folder");
did=dopen('folder');
do fnum=1 to dnum(did);
filename=catx('\',"&folder",dread(did,fnum));
dataset=scan(filename,-2,'./\');
if 'gz'=lowcase(scan(filename,-1,'.')) then output;
end;
did=dclose(did);
rc=filename('folder');
run;
Now that you have the list of files you can use it to call the macro once for each file.
data _null_;
set taq_files;
call execute(cats('%nrstr(%taq)(',filename,',',dataset,')'));
run;
The body of the macro can include the code to both read the values from the delimited files and calculate any new variables you want. There should not be any need to do that in multiple steps based on what you have shown so far.
Your logic for converting the timestamp strings into time values seems overly convoluted. Just use informats that match the style of the strings in the file. For example if the strings start with 6 digits that represent HHMMSS then read that using the HHMMSS6. informat. If the filenames has digit strings in the style YYYYMMDD then read that using the YYMMDD8. informat.
Note that a text file that is compressed to 2Gbytes will generate a dataset that is possibly 10 to 30 times that large. You might want to define the individual datasets as views instead to avoid having the use that space by changing the DATA statement:
data &dataset / view=&dataset ;
I was running the following code:
%LET TIME_INTERVAL='MINUTE15';
/*
* Get the file names of a specific location
*/
%MACRO get_filenames(location,filenames);
filename _dir_ "%bquote(&location.)";
data &filenames(keep=fname);
handle=dopen( '_dir_' );
if handle > 0 then do;
count=dnum(handle);
do i=1 to count;
fname=dread(handle,i);
output &filenames;
end;
end;
rc=dclose(handle);
run;
filename _dir_ clear;
%MEND;
%MACRO NBBO (fname);
DATA TICKERS_NBBO;
INFILE &fname;
INPUT SYMBOL $;
RUN;
%mend;
%MACRO CALCU(DATE_VAR);
%get_filenames('./groups',filenames);
data _null_;
set filenames;
by fname;
if fname =: "&TIME_INTERVAL";
%NBBO(fname);
run;
%mend;
However, I got the error: ERROR: No logical assign for filename FNAME.
I was wondering what is the reason that caused this?
There are lots of csv files in the folder groups. I was trying to run the NBBO macro on each of the files and load each of the files into a dataset with infile statement.
You're mixing up data step code and macro code in a manner that's not permitted. You can't extract the contents of the variable fname and use it to populate a macro variable in this way. Instead you're passing the text fname to that macro variable, which then doesn't work.
You also can't run another data step inside a data step in this manner. You need to use one of the various methods for executing it after the data step terminates or during execution.
Finally, it's entirely unclear to me what you're really doing here: because you're going to end up with just one of those files in the dataset.
I think you may want something like this, which is much easier. I use the FILEVAR option in infile, which uses the value of a variable; I have to make some changes to how you calculate fname (I think you have to do this anyway, it has no directory name in it).
%MACRO get_filenames(location,filenames);
filename _dir_ "%bquote(&location.)";
data &filenames(keep=fname);
length fname $512;
handle=dopen( '_dir_' );
if handle > 0 then do;
count=dnum(handle);
do i=1 to count;
fname=catx('\',"%bquote(&location.)",dread(handle,i));
output &filenames;
end;
end;
rc=dclose(handle);
run;
filename _dir_ clear;
%MEND;
%get_filenames(location=c:\temp\cars, filenames=fnam);
data tickers_nbbo;
set fnam;
infile a filevar=fname dlm=',';
input symbol $;
run;
If you really do need to call a data step separately, you either need to use CALL EXECUTE, DOSUBL, or construct macro calls in another way (PROC SQL SELECT INTO, %INCLUDE, etc.).
this is my code to read multiple multisheet excel in sas its giving an error which i am attaching in last.i am only reading first sheet named summary of every excel in that particular folder
%macro sks2sas01(input=d:\excels,out=work.tt);
/* read files in directory */
%let dir=%str(%'dir %")&input.%str(\%" /A-D/B/ON%');
filename myfiles pipe %unquote(&dir);
data list1; length fname $256.;
infile myfiles truncover;
input myfiles $100.;
/* put _infile_;*/
fname=quote(upcase(cats("&input",'\',myfiles)));
out="&out";
drop myfiles;
call execute('
PROC IMPORT DBMS=EXCEL2002 OUT= _1
DATAFILE= '||fname||' REPLACE ;
sheet="summary";
RUN;
proc append data=_1 base='||out||' force; run;
proc delete data=_1; run;
');
run;
filename myfiles clear;
%mend sks2sas01;
%sks2sas01(input=c:\sasupload\excels,out=work.tt);
hereby i am attaching error i am getting:
GOPTIONS ACCESSIBLE;
%macro sks2sas01(input=d:\excels,out=work.tt);
/* read files in directory */
%let dir=%str(%'dir %")&input.%str(\%" /A-D/B/ON%');
filename myfiles pipe %unquote(&dir);
data list1; length fname $256.;
infile myfiles truncover;
input myfiles $100.;
/* put _infile_;*/
fname=quote(upcase(cats("&input",'\',myfiles)));
out="&out";
drop myfiles;
call execute('
PROC IMPORT DBMS=EXCEL2002 OUT= _1
DATAFILE= '||fname||' REPLACE ;
sheet="summary";
RUN;
proc append data=_1 base='||out||' force; run;
proc delete data=_1; run;
');
run;
filename myfiles clear;
%mend sks2sas01;
%sks2sas01(input=c:\sasupload\excels,out=work.tt);
ERROR: Insufficient authorization to access PIPE.
ERROR: Error in the FILENAME statement.
I had this exact same problem the other day. I had no authorization for the pipe command, and dir using sftp wasn't working for me either. This alternative solution worked great for me. In a nutshell, you're going to use some oldschool SAS directory commands to read every file within that directory, and save only the ones that end in .xlsx.
You can consider the name of your Excel files .-delimited, scan the filename of each one backwards, and look at just the first word to obtain only those Excel files. For example:
File name.xlsx
Backwards:
Delimiter
v
xlsx.name File
^^^^
First word
Step 1: Read all XLSX files in the directory, and create a dataset of them
filename dir 'C:\MyDirectory';
data XLSX_Files;
DirID = dopen("dir");
do i = 1 to dnum(DirID);
file_name = dread(DirID, i);
if(upcase(scan(file_name, 1, '.', 'b') ) = 'XLSX') then output;
end;
rc=dclose(DirID);
drop i rc DirID;
run;
Step 2: Read all of those names into a pipe-delimited macro variable
proc sql noprint;
select file_name, count(file_name)
into :XLSX_Files separated by '|',
:tot_files
from XLSX_Files;
quit;
Step 3: Import them all in a macro loop
%macro import_all;
%do i = 1 %to &tot_files;
proc import file="C:\MyDirectory\%scan(&XLSX_Files,&i,|)"
out=XLSX_&i
dbms=xlsx replace;
run;
%end;
%mend;
%import_all;
You can then stack or merge them as you need.
data Master_File;
set XLSX_1-XLSX_&tot_files;
run;
OR
data Master_File;
merge XLSX_1-XLSX_&tot_files;
by key;
run;
i m new to sas and studying different ways to do subject line task.
Here is two ways i knew at the moment
Method1: file statement in data step
*DATA _NULL_ / FILE / PUT ;
data _null_;
set engappeal;
file 'C:\Users\1502911\Desktop\exportdata.txt' dlm=',';
put id $ name $ semester scoreEng;
run;
Method2: Proc Export
proc export
data = engappeal
outfile = 'C:\Users\1502911\Desktop\exportdata2.txt'
dbms = dlm;
delimiter = ',';
run;
Question:
1, Is there any alternative way to export raw data files
2, Is it possible to export the header also using the data step method 1
You can also make use of ODS
ods listing file="C:\Users\1502911\Desktop\exportdata3.txt";
proc print data=engappeal noobs;
run;
ods listing close;
You need to use the DSD option on the FILE statement to make sure that delimiters are properly quoted and missing values are not represented by spaces. Make sure you set your record length long enough, including delimiters and inserted quotes. Don't worry about setting it too long as the lines are variable length.
You can use CALL VNEXT to find and output the names. The LINK statement is so the loop is later in the data step to prevent __NAME__ from being included in the (_ALL_) variable list.
data _null_;
set sashelp.class ;
file 'class.csv' dsd dlm=',' lrecl=1000000 ;
if _n_ eq 1 then link names;
put (_all_) (:);
return;
names:
length __name__ $32;
do while(1);
call vnext(__name__);
if upcase(__name__) eq '__NAME__' then leave;
put __name__ #;
end;
put;
return;
run;
I'm trying to use a double pipe delimiter "||" when I export a file from SAS to txt. Unfortunately, it only seems to correctly delimit the header row and uses the single version for the data.
The code is:
proc export data=notes3 outfile='/file_location/notes3.txt'
dbms = dlm;
delimiter = '||';
run;
Which results in:
ID||VAR1||VAR2
1|0|STRING1
2|1|STRING2
3|1|STRING3
If you want to use a two character delimiter, you need to use dlmstr instead of dlm in the file statement in data step file creation. You can't use proc export, unfortunately, as that doesn't support dlmstr.
You can create your own proc export fairly easily, by using dictionary.columns or sashelp.vcolumn to construct the put statement. Feel free to ask more specific questions on that side if you need help with it, but search around for data driven output and you'll most likely find what you need.
The reason proc export won't use a double pipe is because it generates a data step to do the export, which uses a file statement. This is a known limitation - quoting the help file:
Restriction: Even though a character string or character variable is
accepted, only the first character of the string or variable is used
as the output delimiter. This differs from INFILE DELIMITER=
processing.
The header row || works because SAS constructs it as a string constant rather than using a file statement.
So I don't think you can fix the proc export code, but here's a quick and dirty data step that will transform the output into the desired format, provided that your dataset has no missing values and doesn't contain any pipe characters:
/*Export as before to temporary file, using non-printing TAB character as delimiter*/
proc export
data=sashelp.class
outfile="%sysfunc(pathname(work))\temp.txt"
dbms = dlm;
delimiter = '09'x;
run;
/*Replace TAB with double pipe for all rows beyond the 1st*/
data _null_;
infile "%sysfunc(pathname(work))\temp.txt" lrecl = 32767;
file "%sysfunc(pathname(work))\class.txt";
input;
length text $32767;
text = _infile_;
if _n_ > 1 then text = tranwrd(text,'09'x,'||');
put text;
run;
/*View the resulting file in the log*/
data _null_;
infile "%sysfunc(pathname(work))\class.txt";
input;
put _infile_;
run;
As Joe suggested, you could alternatively write your own delimiter logic in a dynamically generated data step, e.g.
/*More efficient option - write your own delimiter logic in a data step*/
proc sql noprint;
select name into :VNAMES separated by ','
from sashelp.vcolumn
where libname = "SASHELP" and memname = "CLASS";
quit;
data _null_;
file "%sysfunc(pathname(work))\class.txt";
set sashelp.class;
length text $32767;
text = catx('||',&VNAMES);
put text;
run;