I need to delete all text files form a directory. The following program works fine for the file listed (eg:file.txt), but when I try to use *.txt it doesn't work. Am I missing something or is there a better way to delete all txt files in a directory.
data _null_;
fname = "_files";
rc = filename(fname,"&path\file.txt");
if rc = 0 and fexist(fname) then
rc = fdelete(fname);
rc = filename(fname);
run;
If you are a fan of macros.. the code below should do the same.
options mlogic;
%macro delete_all_txt_files_in_folder(folder);
filename filelist "&folder";
data _null_;
dir_id = dopen('filelist');
total_members = dnum(dir_id);
do i = 1 to total_members;
member_name = dread(dir_id,i);
if scan(lowcase(member_name),2,'.')='txt' then do;
file_id = mopen(dir_id,member_name,'i',0);
if file_id > 0 then do;
freadrc = fread(file_id);
rc = fclose(file_id);
rc = filename('delete',member_name,,,'filelist');
rc = fdelete('delete');
end;
rc = fclose(file_id);
end;
end;
rc = dclose(dir_id);
run;
%mend;
%delete_all_txt_files_in_folder(C:\try)
You can't use a wildcard with fdelete. You either need to loop over all of the files in the directory, or you can use an x command
x 'del &path.\*.txt';
or similar depending on your OS (but it is OS dependent, and requires XCMD permission.
Here's the loop:
%let path=d:\temp;
filename filrf "&path.";
data _null_;
did = dopen('filrf');
memcount = dnum(did);
do while (memcount>0);
fname = dread(did,memcount);
if scan(lowcase(fname),2,'.')='txt' then do;
rcref = filename('fref',catx('\',"&path.",fname));
rcdel = fdelete('fref');
end;
memcount+-1;
end;
stop;
run;
Related
I am having an issue with my code where it is working when I am hard coding the value (in comments) in the IF statement but when I insert the macro variable, the functions 'Copy' and 'Delete' do not work with no errors generated. Below is the code being used:
*%let pathscr = //files/FEB_P000/Reporting_FS;
%let pathdes = //files/FEB_P000/Reporting_FS/Accounting log/2021;
%let fn = LFNPAccounting;
%let dt = %sysfunc(inputn(&acc_date, yymmddn8.),yymmddn8.); /* 20211209 */
%let Var = &fn&dt;/* LFNPAccounting20211209 */
data _null_;
length fref $8 fname $256;
did = filename(fref,'\\files\FEB_P000\Reporting_FS');
did = dopen(fref);
do i = 1 to dnum(did);
fname = dread(did,i);
newfn = SUBSTR(fname,1,22);
if newfn = &Var then do;
/*if newfn = 'LFNPAccounting20211209' then do;*/
rc1=filename('src',catx('/',"&pathscr",fname));
rc2=filename('des',catx('/',"&pathdes",fname));
rc3=fcopy('src','des');
rc4= fdelete('src');
end;
end;
run;*
Could anyone help please?
Thanks
Hans
I am guessing you try to look into a specified folder pathscr, and if a file matches a certain string (SUBSTR(fname,1,22)), you copy and delete the latter to the Logs folder pathdes.
libname report "/home/kermit/temp/Reporting/";
data report.have20211210
report.have20211209
report.have20211208;
id = 1;
output;
run;
%let pathscr = /home/kermit/temp/Reporting/;
%let pathdes = /home/kermit/temp/Logs/;
%let fn = have; /* Name of the file */
%let type = .sas7bdat; /* File extension */
%let dt = %sysfunc(inputn(%sysfunc(today()), yymmddn8.), yymmddn8.);
%let file = &fn&dt&type.;
%put &=file;
data _null_;
drop rc did;
rc=filename("mydir", "&pathscr.");
did=dopen("mydir");
if did > 0 then do; /* check that the directory can be opened */
do i=1 to dnum(did); /* use dnum() to determine the highest possible member number */
fname=dread(did, i); /* get the name of the file */
if fname = "&file." then do; /* if the name of the file match: */
rc=filename('src', "&pathscr&file.");
rc=filename('des', "&pathdes&file.");
rc=fcopy('src', 'des'); /* copy from source to destination */
rc=fdelete('src'); /* delete from source */
end;
end;
end;
else do; /* if directory cannot be open, put the error message to the logs */
msg=sysmsg();
put msg;
end;
run;
Logs:
FILE=have20211210.sas7bdat
DOPEN opens a directory and returns a directory identifier value (a number greater than 0) that is used to identify the open directory in other SAS external file access functions. If the directory cannot be opened, DOPEN returns 0, and you can obtain the error message by calling the SYSMSG function.
I used today() for the dt macro-variable for convenience sake, but you will have to change it to whatever date you are searching for.
Consider that with the code above, if the file is already in the Logs folder, it will not be overwritten. Note that you do not have to use the CATX function if you put another / at the very end of your specified path.
Result
Macro variables are not resolved when bounded by single quotes. They are resolved when within double quotes.
Try
did = filename(fref,"&path_scr");
You set VAR to a value like:
%let Var = LFNPAccounting20211209 ;
Then you use it to generate a SAS statement:
if newfn = &Var then do;
Which will resolve to
if newfn = LFNPAccounting20211209 then do;
Since I did not see you creating any variable named LFNPAccounting20211209 it is most likely that you want to use this statement instead:
if newfn = "&Var" then do;
So that the SAS code you generate will compare the value of NEWFN to a string literal instead of another variable.
Note: Since it looks like you are using WINDOWS filesystem you should make the comparison case insenstive.
if upcase(newfn) = %upcase("&Var") then do;
Problem Statement:
I am unable to read data from a PDF file using SAS.
What worked well:
I am able to download the PDF from the website and save it.
Not working (Need Help):
I am not able to read the data from a PDF file using SAS. The source content structure is expected to remain the same always. Expected Output is attached as a jpg image.
It would be a great learning and help if someone knows and help me how to tackle this scenario by using SAS program.
I tried something like this:
/*Proxy address*/
%let proxy_host=xxx.com;
%let port=123;
/*Output location*/
filename output "/desktop/Response.pdf";
/*Download the source file and save it in the desired location*/
proc http
url="https://cdn.nar.realtor/sites/default/files/documents/ehs-10-2020-overview-2020-11-19_0.pdf"
method="get"
proxyhost="&proxy_host."
proxyport=&port
out=output;
run;
%let lineSize = 2000;
data base;
format text_line $&lineSize..;
infile output lrecl=&lineSize;
input text_line $;
run;
DATA _NULL_ ;
X "PS2ASCII /desktop/Response.pdf
/desktop/flatfile.txt";
RUN;
You can use Apache PDFBox® library which is an open source Java tool for working with PDF documents. The library can be utilized from within SAS Proc GROOVY with Java code that strips text and it's position on page from a PDF document.
Example:
You will have to write more code to make a data set from the stripped text.
filename overview "overview.pdf";
filename ov_text "overview.txt";
* download a pdf document;
proc http
url="https://cdn.nar.realtor/sites/default/files/documents/ehs-10-2020-overview-2020-11-19_0.pdf"
method="get"
/*proxyhost="&proxy_host." */
/*proxyport=&port */
out=overview;
run;
* download the Apache PDFBox library (a .jar file);
filename jar 'pdfbox.jar';
%if %sysfunc(FEXIST(jar)) ne 1 %then %do;
proc http
url='https://www.apache.org/dyn/closer.lua?filename=pdfbox/2.0.21/pdfbox-app-2.0.21.jar&action=download'
out=jar;
run;
%end;
* Use GROOVY to read the PDF, strip out the text and position, and write that
* parse to a text file which SAS can read;
proc groovy classpath="pdfbox.jar";
submit
"%sysfunc(pathname(overview))" /* the input, a pdf file */
"%sysfunc(pathname(ov_text))" /* the output, a text file */
;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.ArrayList;
import java.util.List;
import java.io.FileWriter;
import java.io.PrintWriter;
public class GetLinesFromPDF extends PDFTextStripper {
static List<String> lines = new ArrayList<String>();
public GetLinesFromPDF() throws IOException {
}
/**
* #throws IOException If there is an error parsing the document.
*/
public static void main( String[] args ) throws IOException {
PDDocument document = null;
PrintWriter out = null;
String inPdf = args[0];
String outTxt = args[1];
try {
document = PDDocument.load( new File(inPdf) );
PDFTextStripper stripper = new GetLinesFromPDF();
stripper.setSortByPosition( true );
stripper.setStartPage( 0 );
stripper.setEndPage( document.getNumberOfPages() );
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
stripper.writeText(document, dummy);
out = new PrintWriter(new FileWriter(outTxt));
// print lines to text file
for(String line:lines){
out.println(line);
}
}
finally {
if( document != null ) {
document.close();
}
if( out != null ) {
out.close();
}
}
}
/**
* Override the default functionality of PDFTextStripper.writeString()
*/
#Override
protected void writeString(String str, List<TextPosition> textPositions) throws IOException {
String places = "";
for(TextPosition tp:textPositions){
places += "(" + tp.getX() + "," + tp.getY() + ") ";
}
lines.add(str + " found # " + places);
}
}
endsubmit;
quit;
* preview the stripped text that was saved;
data _null_;
infile ov_text;
input;
putlog _infile_;
run;
/*
* additional SAS code will be needed to input the text as data
* and construct a data set that matches the original tabular content layout
*/
I'm trying to turn my hash object into a macro so that I can do a match on a number of different analysis variables. Here is the part of the macro with the hash object. I feel that my issue must be with how I am calling/quoting the macros in the hash, because a different version of this hash works without the macro. Thoughts?
The errors I am getting are ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase. ERROR 557-185: Variable data is not an object. And then later in the object, ERROR: File DATA.TEST_BANK_ACCOUNT_ALL_REGS.DATA does not exist.
data data.test_&match_field._all_regs;
if _N_ = 1 then do;
if 0 then set = data.test_&match_field._match_srt;
declare hash contractors(dataset:"data.test_&match_field._match_srt", multidata: 'yes');
contractors.defineKey("&match_var.");
contractors.defineData('fpds_duns',
'xxx_dod_contractor',
"&match_flag.",
'xxx_small_contractor',
'xxx_medium_contractor',
'xxx_large_contractor',
'xxx_reported_relationship',
'xxx_joint_venture_flag');
contractors.defineDone();
end;
set data.test_xxx_200;
rc = contractors.find(key:"&match_var.");
do while (rc=0);
if xxxx_duns = xxx_hq_parent_duns_number or
xxxx_duns = xxx_hq_parent_duns_number or
xxxx_duns = xxx_global_parent_duns_number then xxx_reported_relationship = 'Y';
else xxx_reported_relationship = 'N';
output data.test_&match_field._all_regs;
rc = contractors.find_next(key:"&match_var.");
end;
run;
how can I display range of time with no colon?
data alltime;
do hr = '00:00:00't to '23:59:59't;
output; end; format hr time8.; run;
02:23:30 => 022330
How about B8601TM6. format?
data _null_;
do hr = '00:00:00't to '23:59:59't by 65*60+23 ;
put hr time8. '->' hr b8601tm6. ;
end;
run;
Results:
0:00:00->000000
1:05:23->010523
2:10:46->021046
3:16:09->031609
4:21:32->042132
5:26:55->052655
6:32:18->063218
7:37:41->073741
8:43:04->084304
9:48:27->094827
10:53:50->105350
11:59:13->115913
13:04:36->130436
14:09:59->140959
15:15:22->151522
16:20:45->162045
17:26:08->172608
18:31:31->183131
19:36:54->193654
20:42:17->204217
21:47:40->214740
22:53:03->225303
23:58:26->235826
it worked out like that:
data alltime;
do hr = '00:00:00't to '23:59:59't;
output; end; format hr b8601tm8.; run;
I am working in a configuration that uses the IOM to connect to the metadata server - hence there are no automatic macro variables in my environment to determine the user id (we are using a pooled workspace server with a generic host account).
Is there a short piece of code which can be used to query the metadata server for the SAS user id?
The following is quite long winded, and could probably be shortened - but it does the job!
data _null_;
call symput('login_id',''); /* initialise to missing */
n = 1;
length loginUri person $ 512;
nobj = metadata_getnobj("omsobj:Login?*",n, loginUri);
if (nobj>0) then do;
length __uri __objName __porig personUri $256;
__porig = loginUri;
__uri = '';
__objName = '';
__n = 1;
__objectFound = 0;
personUri = "";
__numObjs = metadata_getnasn(__porig ,"AssociatedIdentity", 1, __uri);
do until(__n > __numObjs | __objectFound );
__rc = metadata_getattr(__uri, "PublicType", __objName);
if __objName="User" then do;
__rc=metadata_getattr(__uri, "Name", __objName);
__objectFound = 1;
personUri = __uri;
end;
else do;
__n = __n+1;
rc = metadata_getnasn(__porig, "AssociatedIdentity", __n, __uri);
end;
end;
if upcase("N")="Y" and not __objectFound then do;
put "*ERROR*: Object with PublicType=" "User" " not found for parent " loginUri " under AssociatedIdentity association";
stop;
end;
;
rc = metadata_getattr(personUri, "Name", person);
call symput("login_id", trim(person));
end;
run;
%put &login_id;