I have a folder which has some tables. Once I opened it, it shows table name, date modified, type and size.
I am trying to read all the information including: table name, date modified, type and size using SAS. so I tried pipe first:
filename tbl pipe "dir /abc/sales";
data new;
infile tbl pad;
input all $500.;
run;
the result only has the table name, but no date modified, type and size.
so just wonder how to fix it.
An example folder 'sales' below:
table name size date modified type
sales1 490k 10/28/2020 9:32:50 am sas7bdat
sales2 85k 11/12/2020 4:28:23 pm sas7bdat
sales3 307k 12/17/2020 1:55:09 pm sas7bdat
From your path it looks like SAS is running on Unix. Not sure what the command dir does on your flavor of Unix, but ls -l should get the file details on any flavor of Unix.
data new;
infile "ls -l /abc/sales/" pipe truncover ;
input all $500.;
run;
Related
I have a directory /run/return/files/archives/prep/share/ that contains both .txt and .csv files.
For example IA_PROD.txt and retour_PROD.csv
I want to read both types of files and extract only their names (IA_PROD and retour_PROD) to store in an excel file named FILE_NAMES.xlsx. I have the code below that extracts .txt and .csv files though two separate data sets (file_list1 and file_list2) and I finally concatenate the two data sets to export in an excel sheet. I wanted to be able to optimise my code to make it one single data step where I read both csv, txt and extract both of them together.
Thanks for your generous help
%let REP_BLOCTEL_ALLER = /run/return/files/archives/prep/share/;
filename result pipe "ls &rep_bloctel_aller./*txt";
filename result2 pipe "ls &rep_bloctel_aller./*csv";
data file_list1;
infile result lrecl=200 truncover;
input rep $120.;
file_name = tranwrd(substr(rep, length("&rep_bloctel_aller./")+1),'.txt','');
call symput(compress('txt_'!!put(_n_,2.)),file_name);
call symput('n_obs',put(_n_,2.));
run;
data file_list2;
infile result2 lrecl=200 truncover;
input rep $120.;
file_name = tranwrd(substr(rep, length("&rep_bloctel_aller./")+1),'.csv','');
call symput(compress('csv_'!!put(_n_,2.)),file_name);
call symput('n_obs',put(_n_,2.));
run;
DATA file_list;
SET file_list1 file_list2;
RUN;
proc export data = file_list
(keep = file_name)
outfile="&rep_bloctel_aller./FILE_NAME_BLOCTEL.xlsx"
dbms=xlsx
replace;
sheet= "FILE_NAME_BLOCTEL";
run;
Not sure if it answers your question, but I think this answers your problem :)
What I propose it to filter your input files with ls :
%let REP_BLOCTEL_ALLER = /run/return/files/archives/prep/share/;
filename result pipe "ls &file_path. | egrep -i '\.csv$|\.txt$'";
data file_list;
infile result lrecl=200 truncover;
input filename $120.;
run;
I don't think we can make it shorter!
Some explanations:
ls lists your directory
the pipe | forwards the result of ls to the egrep command
egrpe is used to search for text value in the result sent by ls
the -i option indicates to egrep that text lookup must be case insensitive (will detect TXT, txt, TxT files and so on)
The '.csv$|.txt$' indicates to search for either '.csv' or '.txt' at the end of a line ($), which corresponds to the file the extension
If the goal is to just generate the list into an XLSX file then you just need:
libname out xlsx "&rep_bloctel_aller./FILE_NAME_BLOCTEL.xlsx";
data out.FILE_NAME_BLOCTEL;
infile "cd &rep_bloctel_aller.; ls *.txt *.csv" pipe truncover;
input file_name $256.;
run;
I have access to a linux directory where it contains multiple folders with various names.
Eg.
01312019
19990131
europe_1
johncena
Based on the 4 samples above, only the first and second line are valid date format (MMDDYYYY & YYYYMMDD).
What I want to achieve is to identify and flag the folder that is having the date format that I want, for example, MMDDYYYY. Once identified, I will write a set of rules to further process it.
My script below is already able to scan for the directory.
data allfilenames;
length fref $8 fname $200;
did = filename(fref,"&ROOT./&Directory.");
did = dopen(fref);
do i = 1 to dnum(did);
fname = dread(did,i);
output;
end;
did = dclose(did);
did = filename(fref);
keep fname;
run;
data folderonly;
set allfilenames;
if count(fname,'.') >0 then delete;
run;
However, now I am stuck on how to check the folder name for its date format. Again, it is possible that folder names might not contain a valid date format at all, or contain date format differently (YYYYMMDD or MMDDYYYY).
Is there any guide that I can follow?
data folderonly ;
input #1 fname $20.;
datalines ;
01122021
12312021
20210901
blahblah
;
run ;
data check ;
set folderonly ;
if input(fname,??mmddyy8.) then valid_mmddyy = 1 ;
run ;
The ?? before the informat suppresses any errors due to data which doesn't match the informat.
here is the scenario: at the beginning, I prepare to import a csv file; then in Proc SQL, I insert the record of temp data set into database, the following are my difficulties:
for the sake of audit, I want to update one record in a table in the database to record this insert operation:
update table1
set inserted_record=&SQLOBS, insert_date=today()
where filename=&csv_file_name;
But the length of the filename is more than 32 character.what should I do ? Thanks!
My SAS code is like the following:
DATA Temp1;
File_name="kkkkkkkkkkk_product_information_20200101_20211005_FULL.csv"
run;
Data work.temptable;
length
Product_ID $36
Worth_USD $9;
Format
Product_ID Char36.
Worth_USD Char9.;
Informat
Infile
input
Run;
Libname lib1 Teradata user=userid Password=xxxxxx
proc SQL;
insert into lib1.table1(col1,col2)
select prodcut_id,worth_usd from work.temp_table;
update lib1.import_summary set inserted_record=&sqlobs,operated_date=today() where file_name='&file_name';
Run;
according to the log, the SAS code can do the insert operation successfully while the update operation is not (the log shows "No rows were updated"). I check the table of import_summary, there is already a record whose file_name is "kkkkkkkkkkk_product_information_20200101_20211005_FULL.csv". It should be updated. Who can provide the comments? Thanks!
From your code shown this shouldn't affect anything, you do need to have quotes around the file name as it's likely a character field but the 32 char limit is only on data set names which this is not and the file name doesn't have a 32 character limit.
update table1
set inserted_record=&SQLOBS, insert_date=today()
where filename="&csv_file_name";
EDIT:
This needs double quotes, not single quotes:
where file_name='&file_name';
I am trying to use SAS to read multiple files from a directory and they were created before a date.
I have used this code to help me to read all the files. It works perfectly. Now I found out that only some files that were created before a certain date are what I need. I think that could be done either by FILENAME PIPE Dir options or by INFILE statement options, but I cannot find the answers.
code source:
http://support.sas.com/kb/41/880.html
filename DIRLIST pipe 'dir "C:\_today\file*.csv" /b ';
data dirlist ;
infile dirlist lrecl=200 truncover;
input file_name $100.;
run;
data _null_;
set dirlist end=end;
count+1;
call symputx('read'||put(count,4.-l),cats('c:\_today\',file_name));
call symputx('dset'||put(count,4.-l),scan(file_name,1,'.'));
if end then call symputx('max',count);
run;
options mprint symbolgen;
%macro readin;
%do i=1 %to &max;
data &&dset&i;
infile "&&read&i" lrecl=1000 truncover dsd;
input var1 $ var2 $ var3 $;
run;
%end;
%mend readin;
%readin;
Currently you are reading in just the file names using the dir command. The existing /b modifier is saying print just the file name and nothing else. You want to change it to read both the file name and the CREATED date of the file. In order to do that it gets a little messy. You will need to change that pipe command from:
filename DIRLIST pipe 'dir "C:\_today\file*.csv" /b ';
...to this... :
filename DIRLIST pipe 'dir "C:\_today\file*.csv" /tc ';
The output will change from something like this:
file1.csv
file2.csv
...
...to something like this... :
Volume in drive C has no label.
Volume Serial Number is 90ED-A122
Directory of C:\_today
01/13/2017 09:14 AM 1,991 file1.csv
01/11/2017 11:43 AM 169 file2.csv
...
...
...
01/11/2017 11:43 AM 169 file99.csv
99 File(s) 6,449 bytes
0 Dir(s) 57,999,806,464 bytes free
So you will then need to modify your data step that creates dirlist to clean up the results returned by the new dir statement. You will need to ignore the header and footer and read in the date and time etc. Once you have that date and time in the appropriate SAS format, you can then just use a SAS where clause to keep the rows you are interested in. I will leave this as an exercise for you to do. If you have trouble with it you can always open a new question.
If you need more information on the dir command, you can open up a command prompt (Start Menu->Run->"cmd"), and then type in dir /? to see a list of available switches for the dir command. You may find a slightly different combination of switches for it that better suits your task than what I listed above.
You can use powershell to leverage the features of the operating system.
filename get_them pipe
" powershell -command
""
dir c:\temp
| where {$_.LastWriteTime -gt '3/19/2019'}
| select -property name
| ft -hidetableheader
""
";
data _null_;
infile get_them;
input;
putlog _infile_;
run;
My program makes a web-service call and receives a response in XML format which I store as output.txt. When opened in notepad, the file looks like this
<OwnerInquiryResponse xmlns="http://www.fedex.com/esotservice/schema"><ResponseHeader><TimeStamp time="2018-02-01T16:09:19.319Z"/></ResponseHeader><Owner><Employee firstName="Gerald" lastName="Harris" emplnbr="108181"/><SalesAttribute type="Sales"/><Territory NodeGlobalRegion="US" SegDesc="Worldwide Sales" SegNbr="1" TTY="2-2-1-2-1-1-10"/></Owner><Delegates/><AlignmentDetail><SalesAttribute type="Sales"/><Alignments/></AlignmentDetail></OwnerInquiryResponse>
I am unable to read this file into SAS using proc IMPORT. My SAS code is below
proc import datafile="/mktg/prc203/abhee/output.txt" out=work.test2 dbms=dlm replace;
delimiter='<>"=';
getnames=yes;
run;
My log is
1 %_eg_hidenotesandsource;
5 %_eg_hidenotesandsource;
28
29 proc import datafile="/mktg/prc203/abhee/output.txt" out=work.test2 dbms=dlm replace;
30 delimiter='<>"=';
31 getnames=yes;
32 run;
NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to
WORK.PARMS.PARMS.SLIST.
Unable to sample external file, no data in first 5 records.
ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.09 seconds
cpu time 0.09 seconds
33
34 %_eg_hidenotesandsource;
46
47
48 %_eg_hidenotesandsource;
51
My ultimate goal is to mine Employee first name (Gerald), last name (Harris) and Employee Number (108181) from the above file and store it in the dataset (and then do this over and over again with a loop and upend the same dataset). If you can help regarding importing the entire file or just the information that I need directly, then that would help.
If you only need these three fields then named input a single input statement is perfectly viable, and arguably preferable to parsing xml with regex:
data want;
infile xmlfile dsd dlm = ' /';
input #"Employee" #"firstName=" firstName :$32. #"lastName=" lastName :$32. #"emplnbr=" emplnbr :8.;
run;
This uses the input file constructed in Richard's answer. The initial #Employee is optional but reduces the risk of picking up any fields with the same names as the desired ones that are subfields of a different top-level field.
Bonus: the same approach can also be used to import json files if you're in a similar situation.
Since you are unable to use the preferred methods of reading xml data, and you are processing a single record result from a service query the git'er done approach seems warranted.
One idea that did not pan out was to use named input.
input #'Employee' lastname= firstname= emplnbr=;
The results could not be made to strip the quotes with $QUOTE. informat nor honor infile dlm=' /'
An approach that did work was to read the single line and parse the value out using a regular expression with capture groups. PRXPARSE is used to compile a pattern, PRXMATCH to test for a match and PRXPOSN to retrieve the capture group.
* create a file to read from (represents the file from the service call capture);
options ls=max;
filename xmlfile "%sysfunc(pathname(WORK))\1-service-call-record.xml";
data have;
input;
file xmlfile;
put _infile_;
datalines;
<OwnerInquiryResponse xmlns="http://www.fedex.com/esotservice/schema"><ResponseHeader><TimeStamp time="2018-02-01T16:09:19.319Z"/></ResponseHeader><Owner><Employee firstName="Gerald" lastName="Harris" emplnbr="108181"/><SalesAttribute type="Sales"/><Territory NodeGlobalRegion="US" SegDesc="Worldwide Sales" SegNbr="1" TTY="2-2-1-2-1-1-10"/></Owner><Delegates/><AlignmentDetail><SalesAttribute type="Sales"/><Alignments/></AlignmentDetail></OwnerInquiryResponse>
run;
* read the entire line from the file and parse out the values using Perl regular expression;
data want;
infile xmlfile;
input;
rx_employee = prxparse('/employee\s+firstname="([^"]+)"\s+lastname="([^"]+)"\s+emplnbr="([^"]+)"/i');
if prxmatch(rx_employee,_infile_) then do;
firstname = prxposn(rx_employee, 1, _infile_);
lastname = prxposn(rx_employee, 2, _infile_);
emplnbr = prxposn(rx_employee, 3, _infile_);
end;
keep firstname last emplnbr;
run;