How to get last modified time of file while using infile statement? - sas

I am going to analyze a batch of SAS program file and I am stucked in getting the last modified time of program files. I have thought about X command but it was too inefficient.
I just find when I use infile statement:
data test;
infile 'D:\test.txt' truncover;
input ;
run;
Log shows the last modified time:
NOTE: The infile 'D:\test.txt' is:
Filename=D:\test.txt,
RECFM=V,LRECL=32767,File Size (bytes)=7,
Last Modified=2021/1/26 15:25:48,
Create Time=2021/1/26 15:25:42
As you can see, log window shows the infomation of file as a NOTE. However, my wish output is a variable filled with Last Modified Time.
Is there some option to get it while using infile statement?
Surely, Other efficient ways are welcomed, too.

Use functions FOPEN and FINFO
Example:
Show all available information items and their value for a sample data file.
filename datafile 'c:\temp\datafile.txt';
data _null_;
file datafile;
put 'Me data';
run;
data _null_;
fid = fopen('datafile');
if fid then do;
do index = 1 to foptnum(fid);
info_name = foptname(fid,index);
info_value = finfo(fid, info_name);
put index= info_name= #40 info_value=;
end;
rc = fclose(fid);
end;
run;
Will log information such as
index=1 info_name=Filename info_value=c:\temp\datafile.txt
index=2 info_name=RECFM info_value=V
index=3 info_name=LRECL info_value=32767
index=4 info_name=File Size (bytes) info_value=9
index=5 info_name=Last Modified info_value=26Jan2021:06:29:47
index=6 info_name=Create Time info_value=26Jan2021:06:28:23

Related

How to create a SAS dataset for each individual trading day (TAQ) data and save them to a file

I have the daily trading data (TAQ data) for a month. I am trying to unzip each of them.
The folder's name is EQY_US_ALL_TRADE_202107.
It has several zipped (GZ files) files for each trading day named as
EQY_US_ALL_TRADE_202210701
EQY_US_ALL_TRADE_202210702
EQY_US_ALL_TRADE_202210703 ...
EQY_US_ALL_TRADE_202210729
I want to create a SAS dataset for each individual day and save them to a file. As far as I understand, I need a do loop to go through a month of daily TAQ data and calculate the trade duration and then just save the relevant data to a file so that each saved data set would be small, and then I have to aggregate them all up. For calculating trade duration, I am just taking the difference of the "DATETIME" variable, (ex. dif(datetime))
Until now, I have been working by making one working directory (D:\MainDataset) and doing calculations in it starting with unzipping files. But it is taking too much time and disk space. I need to create separate datasets for each trading day and save it to a file.
data "D:\MainDataset" (keep= filename time exchange symbol saleCondition tradeVolume tradePrice tradeStopStock
tradeCorrection sequenceNumber tradeId sourceOfTrade tradeReportingFacility
participantTimeStamp tradeReportingFacilityTimeStamp);
length folderef $8 time $15. exchange $1. symbol $17. saleCondition $4. tradeStopStock $1.
sourceOfTrade $1. tradeReportingFacility $1.
participantTimeStamp $15. tradeReportingFacilityTimestamp $15.;
rc=filename(folderef,"D:\EQY_US_ALL_TRADE_202107");
did = dopen(folderef);
putlog did=;
/* do k = 1 to dnum(did); Use this to run the loop over all files in the folder */
do k = 1 to 3;
filename = dread(did,k);
putlog filename=;
if scan(filename,-1,'.') ne 'gz' then continue;
fullname = pathname(folderef) || '\' || filename;
putlog fullname=;
do while(1);
infile archive zip filevar=fullname gzip dlm='|' firstobs=2 obs=5000000 dsd truncover eof=nextfile;
input time exchange symbol saleCondition tradeVolume tradePrice tradeStopStock
tradeCorrection sequenceNumber tradeId sourceOfTrade tradeReportingFacility
participantTimeStamp tradeReportingFacilityTimeStamp;
output;
end;
nextfile:
end;
stop;
run;
Proc contents data = "D:\MainDataset";
run;
proc print data ="D:\MainDataset" (obs = 110);
run;
Create code to process one file. Probably coded as a macro that takes as input the name of the file to read and the name of the dataset to create.
%macro taq(infile,dataset);
data &dataset;
infile "&infile" zip gzip dsd dlm='|' truncover firstobs=2;
....
run;
%mend taq;
Then generate a dataset with the names of the files to read and the dataset names you want to create from them. So perhaps something like this:
%let ym=202107;
%let folder=D:\EQY_US_ALL_TRADE_&ym;
data taq_files;
length dataset $32 filename $256 ;
keep dataset filename;
rc=filename('folder',"&folder");
did=dopen('folder');
do fnum=1 to dnum(did);
filename=catx('\',"&folder",dread(did,fnum));
dataset=scan(filename,-2,'./\');
if 'gz'=lowcase(scan(filename,-1,'.')) then output;
end;
did=dclose(did);
rc=filename('folder');
run;
Now that you have the list of files you can use it to call the macro once for each file.
data _null_;
set taq_files;
call execute(cats('%nrstr(%taq)(',filename,',',dataset,')'));
run;
The body of the macro can include the code to both read the values from the delimited files and calculate any new variables you want. There should not be any need to do that in multiple steps based on what you have shown so far.
Your logic for converting the timestamp strings into time values seems overly convoluted. Just use informats that match the style of the strings in the file. For example if the strings start with 6 digits that represent HHMMSS then read that using the HHMMSS6. informat. If the filenames has digit strings in the style YYYYMMDD then read that using the YYMMDD8. informat.
Note that a text file that is compressed to 2Gbytes will generate a dataset that is possibly 10 to 30 times that large. You might want to define the individual datasets as views instead to avoid having the use that space by changing the DATA statement:
data &dataset / view=&dataset ;

Get total number of observations (rows) in sas dataset (.sas7dat file)

I'm looking for getting the total number of rows (count) from a sas dataset file using SAS code.
I tried this code
data _null_; infile "C:\myfiles\sample.sas7bdat" end=eof; input; if eof then put "Lines read=====:" ; run;
This is the results OUTput I get(does not show the number of lines).Obviously, I did not get any actual number of lines in the file
Lines read=====:
NOTE: 1 record was read from the infile
"C:\myfiles\sample.sas7bdat".
However, I know the number of lines in that sample.sas7dat file is more than 1.
Please help!
The INFILE statement is for reading a file as raw TEXT. If you have a SAS dataset then you can just SET the dataset to read it into a data step.
So the equivalent for your attempted method would be something like:
data _null_;
set "C:\myfiles\sample.sas7bdat" end=eof;
if eof then put "Observations read=====:" _n_ ;
run;
One cool thing about sas7bdat files is the amount of metadata stored with them. The row count of that file is already known by SAS as an attribute. You can use proc contents to read it. Observations is the number of rows in the table.
libname files "C:\myfiles";
proc contents data=files.sample;
run;
A more advanced way is to open the file directly using macro functions.
%let dsid = %sysfunc(open(files.sample) ); /* Open the file */
%let nobs = %sysfunc(attrn(&dsid, nlobs) ); /* Get the number of observations */
%let rc = %sysfunc(close(&dsid) ); /* Close the file */
%put Total observations: &nobs

Read a file line by line for every observation in a dataset

I'm trying to create a program that takes a text file, replaces any macro references within it, and appends it to a single output file. The macro references are generated as I iterate over the observations in a dataset.
I'm having trouble trying to get it to read the entire text file for each observation in my source table. I think there's an implicit stop instruction related to my use of the end= option on the infile statement that is preventing my set statement from iterating over each record.
I've simplified the template and code, examples below:
Here is the template that I'm trying to populate:
INSERT INTO some_table (name,age)
VALUES (&name,&age);
Here is the SAS code:
filename dest "%sysfunc(pathname(work))\backfill.sql";
data _null_;
attrib line length=$1000;
set sashelp.class;
file dest;
infile "sql_template.sas" end=template_eof;
call symput('name', quote(cats(name)));
call symput('age' , cats(age));
do while (not template_eof);
input;
line = resolve(_infile_);
put line;
end;
run;
Running the above code produces the desired output file but only for the first observation in the dataset.
You cannot do it that way since after the first observation you are already at the end of the input text file. So your DO WHILE loop only runs for the first observation.
Here is a trick that I learned a long time ago on SAS-L. Toggle between two input files so that you can start at the top of the input file again.
First let's create your example template program and an empty dummy file.
filename template temp;
filename dummy temp;
data _null_;
file template;
put 'INSERT INTO some_table (name,age)'
/ ' VALUES (&name,&age)'
/ ';'
;
file dummy ;
run;
Now let's write a data step to read the input data and use RESOLVE() function to convert the text.
filename result temp;
data _null_;
length filename $256 ;
file result ;
set sashelp.class;
call symputx('name', catq('1at',name));
call symputx('age' , age);
do filename=pathname('template'),pathname('dummy');
infile in filevar=filename end=eof ;
do while (not eof);
input;
_infile_ = resolve(_infile_);
put _infile_;
end;
end;
run;
The resulting file will look like this:
INSERT INTO some_table (name,age)
VALUES ('Alfred',14)
;
INSERT INTO some_table (name,age)
VALUES ('Alice',13)
;
...

Explain usage of put statement

data _null_;
%let _EFIRR_=0;
%let _EFIREC_=0;
file '/home/abc/demo/sale.csv' delimiter=',' DSD;
put country=;
run;
I wrote this code but couldn't find anything in the log. Shouldn't I be getting country=xyz in the log?
The FILE statement is used to write out to files. I believe you were attempting to read country values from the file instead.
You need the INFILE statement:
data _null_;
%let _EFIRR_=0;
%let _EFIREC_=0;
/* infile statement points to the file which is being read */
infile '/home/abc/demo/sale.csv' delimiter=',' DSD;
/* Input statement specifies which columns to populate from the file */
input country $;
/* A put statement in a data step without an associated */
/* file statement will output lines in the log */
put country=;
run;

Do loop following filevar option in SAS

data &state.&sheet.;
set di;
retain &header.;
infile in filevar= path end=done missover;
do until(done);
if _N_ =1 then
input &headerlength.;
input &allvar.;
output;
end;run;
variable path is in di data set.
I wanna read multiple txt files into one SAS data set. In each txt file the first row is header and I want to retain this header for each observation so I used if _N_ = 1 input header then input second row of other variables for analysis.
The output is very strange. only the first row contains header and other rows are not correct observations.
Could someone help me a little bit? Thank you so much.
I like Shenglin Chen's answer, but here's another option: reset the row counter to 1 each time the data step starts importing a new file.
data &state.&sheet.;
set di;
retain &header.;
infile in filevar= path end=done missover;
do _N_ = 1 by 1 until(done);
if _N_ = 1 then input &headerlength.;
input &allvar.;
output;
end;
run;
This generalises more easily in case you ever want to do something different with every nth row within each file.
Try:
data &state.&sheet.;
set di;
retain &header.;
infile in filevar= path end=done missover dlm='09'x;
input &headerlength.;
do until(done);
input &allvar.;
output;
end;
run;
You should use WHILE (NOT DONE) instead of UNTIL (DONE) to prevent reading past the end of the file, and stopping the data step, when the file is empty. Or for some of the answers when the file only has the header row.