Sas mainframe data filename older rec deletion [duplicate] - sas

This question already has an answer here:
sas mainframe deletion file between two dates [closed]
(1 answer)
Closed 2 years ago.
DATA ML1;
INFILE CARDS;
INPUT #1 FILENAME $CHAR10.
#12 REFDT YYMMDD8.;
CARDS;
LOAN_CREA 20/09/20
LOAN_UPDT 18/09/20
LOAN_MAIN 19/09/20
;
RUN;
DATA DEL;
SET ML1;
PUT #1 ' DELETE ' FILENAME;
RUN;
Hi All plz look above code i am facing one issue i want to delete older filename not recent one kindly find below my requirement
My Req is :
LOAN_UPDT
LOAN_MAIN

Its not entirely clear what you are trying to do. This will PUT a delete for all except the newest:
* Sort file to make newest the first;
Proc sort data=ml1;
by refdt dsecending;
data _null_;
set ml1; by refdt;
if first.refdt then return; * Ignore the first one (newest);
put #1 ' DELETE' filename;

Related

Extract 2nd and 3rd character in a string - SAS

I have a variable DRG in my dataset and I would like to create a new variable with the second and third characters in the DRG string. For example, if DRG value is A23B I would like to extract 23 as a new variable.
Can someone please help me with the SAS code. Thanks a lot in advance.
Sample code
data example;
input DRG $4.;
cards;
A23B
A13A
A45C
B82B
B82C
B34A
C01A
C25B
C46B
;
run;
Thanks for the help.
I was able to work out the answer by following this webpage https://www.listendata.com/2017/03/extract-last-4-characters-digits-in-sas.html
Here is my code:
data example2;
set example;
want = substr(DRG,length(DRG)-2,2);
run;

In SAS compute an md5 hash for whole file given you have an md5 hash for each record

This is a follow-up to my recent question on calculating md5 hash in SAS and python. So, I'm using SAS v9.2 and there is an md5 hash function which takes in a string and returns a hash. What I'd really like though is a way to compute the hash for the file as a whole. Given that I have a hash for each record , is there any way to do this and have the file hash match up with the value obtained by using , say, python code. Taking the sashelp.shoes dataset as an example I exported this to a CSV file and manually removed double quotes and dollars and commas of the currency fields. I then computed the hash for the file as a whole using this python code:
filename = "f:/test/shoes.csv"
md5_hash = hashlib.md5()
with open(filename,"rb") as f:
# Read and update hash string value in blocks of 4K
for byte_block in iter(lambda: f.read(1024*1024),b''):
md5_hash.update(byte_block.replace(b'\r', b'').replace(b'\n', b''))
print(md5_hash.hexdigest())
And got this hash back as output:
f7f205b5b844bf57f5f51685969e0df0
If anyone can replicate this final hash value in SAS for that dataset that would be great.
PS I'm on SAS V9.2
You have two options:
Implement the MD5 algorithm in SAS. I'm aware of existing implementations for SHA and CRC but I'm not sure about MD5.
Call an external utility from SAS to calculate the md5 hash for the file. There is an example here.
My earlier note on limitations applies only when working with DS1. There is no way around the length restriction in DS1. You could try this and you will get an error:
data test;
length x $30000;
x = repeat('-', 30000);
run;
data _null_;
set test;
format m $hex32.;
m = md5(catx(',', x, x));
put m=;
run;`
But Robert Pendridge is correct to point out that DS2 can solve this issue.
%let reclen = 201; /* Length of each record */
%let records = 2000; /* Number of records */
%let totlen = %eval(&reclen * &records);
proc ds2;
data _null_;
retain m;
dcl char(&totlen) m;
method run();
dcl char(200) c;
set shoes;
c = catx(',',&varstr2);
m = strip(m)|| strip(c);
end;
method term();
dcl char(32) hh;
hh = put(md5(m), $hex32.);
put hh=;
end;
enddata;
run;
quit;
This is essentially doing what the Python code is doing. The update merely concatenates the strings and applies the hash. You may have to tighten this up a little bit to remove any extraneous spaces etc., but should work.
Unfortunately you cannot in DS1. The reason is that the maximum variable size that SAS allows is only 32,767 bytes long. You could group the variables in multiple variables, but still when you try to concatenate them (even directly when invoking the md5 function), it will end up truncating it. Your best bet is writing the output to an external text file (as shown below based on your previous example) and generating md5sum on it. This is actually just one little extra step.. You could just use the X command to do that from within SAS itself (provided you are configured to do so).
filename ff "contents.txt" TERMSTR=CR;
data _null_;
set shoes end = lastrec;
newvar2 = catx(',',&varstr2);
file ff;
put newvar2;
run;

SAS format char

First i have created this table
data rmlib.tableXML;
input XMLCol1 $ 1-10 XMLCol2 $ 11-20 XMLCol3 $ 21-30 XMLCol4 $ 31-40 XMLCol5 $ 41-50 XMLCol6 $ 51-60;
datalines;
| AAAAA A||AABAAAAA|| BAAAAA|| AAAAAA||AAAAAAA ||AAAA |
;
run;
I want to clean, concatenate and export. I have written the following code
data rmlib.tableXML_LARGO;
file CleanXML lrecl=90000;
set rmlib.tableXML;
array XMLCol{6} ;
array bits{6};
array sqlvars{6};
do i = 1 to 6;
*bits{i}=%largo(XMLCol{i})-2;
%let bits =input(%largo(XMLCol{i})-2,comma16.5);
sqlvars{i} = substr(XMLCol{i},2,&bits.);
put sqlvars{i} &char10.. #;
end;
run;
the macro largo count how many characters i have
%macro largo(num);
length(put(&num.,32500.))
%mend;
What i need is instead of have char10, i would like that this number(10) would be the length, of each string, so to have something like
put sqlvars{i} &char&bits.. #;
I don't know if it possible but i can't do it.
I would like to see something like
AAAAA AAABAAAAA BAAAAA AAAAAAAAAAAAA AAAA
It is important to me to keep the spaces(this is only an example of an extract of a xml extract). In addition I will change (for example) "B" for "XPM", so the size will change after cleaning the text, that it what i need to be flexible in the char
Thank you for your time
Julen
I'm still not quite sure what you want to achieve, but if you want to combine the text from multiple varriables into one variable, then you could do something along the lines:
proc sql;
select name into :names separated by '||'
from dictionary.columns
where 1=1
and upcase(libname)='YOURLIBNAME'
and upcase(memname)='YOURTABLENAME';
quit;
data work.testing;
length resultvar $ 32000;
set YOURLIBNAME.YOURTABLENAME;
resultvar = &names;
resultvar2 = compress(resultvar,'|');
run;
Wasn't able to test this, but this should work if you replace YOURLIBNAME and YOURTABLENAME with your respective tables. I'm not 100% sure if the compress will preserve the spaces in the text.. But I think it should.
The format $VARYING. <length-variable> is a good candidate for solving this output problem.
On the presumption of having a number of variables whose values are vertical-bar bounded and wanting to output to a file the concatenation of the values without the bounding bars.
data have;
file "c:\temp\want.txt" lrecl=9000;
length xmlcol1-xmlcol6 $100;
array values xmlcol1-xmlcol6 ;
xmlcol1 = '| A |';
xmlcol2 = '|A BB|';
xmlcol3 = '|A BB|';
xmlcol4 = '|A BBXC|';
xmlcol5 = '|DD |';
xmlcol6 = '| ZZZ |';
do index = 1 to dim(values);
value = substr(values[index], 2); * ignore presumed opening vertical bar;
value_length = length(value)-1; * length with still presumed closing vertical bar excluded;
put value $varying. value_length #; * send to file the value excluding the presumed closing vertical bar;
end;
run;
You have some coding errors in that is making it difficult to understand what you want to do.
Your %largo() macro doesn't make any sense. There is no format 32500.. The only reason it would run in your code is because you are trying to apply the format to a character variable instead of a number. So SAS will automatically convert to use the $32500. instead.
The %LET statement that you have hidden in the middle of your data step will execute BEFORE the data step runs. So it would be less confusing to move it before the data step.
So replacing the call to %largo() your macro variable BITS will contain this text.
%let bits =input(length(put(XMLCol{i},32500.))-2,comma16.5);
Which you then use inside a line of code. So that line will end up being this SAS code.
sqlvars{i} = substr(XMLCol{i},2,input(length(put(XMLCol{i},$32500.))-2,comma16.5));
Which seems to me to be a really roundabout way to do this:
sqlvars{i} = substr(XMLCol{i},2,length(XMLCol{i})-2);
Since SAS stores character variables as fixed length, it will pad the value stored. So what you need to do is to remember the length so that you can use it later when you write out the value. So perhaps you should just create another array of numeric variables where you can store the lengths.
sqllen{i} = length(XMLCol{i})-2;
sqlvars{i} = substr(XMLCol{i},2,sqllen{i});

How to write batch file in SAS that automates the opening of files?

I was assigned a task that I don’t know where to start. Here’s the context:
There’s a variable in the data, say VAR1, indicating the directory to a bunch of image files. So for observation 1, VAR1 may look like D:\Project\Data\Images\Image1.tiff and so on. Of course, those image files exist in the computer.
What I need to do is to figure out SAS program(s) and later run them automatically using batch file. When the batch file runs, it will, in some way, opens the image files one by one. By “one by one”, I mean it firsts open one image file and, upon closing that file, it opens the next image file until the end of the list.
Better yet, the batch file will make a copy of the original image files and put them in some folder (e.g., D:\Project\Data\Temp images) before opening them. That is to make sure original data is left untouched.
Do you know how I can write such a program in SAS? I was given the following SPSS file for reference, which does that job nicely as described. I don’t know enough SPSS to understand every detail how it works. The two variables dir5 and tiff5 specify the location of the image files, and variables SCQID and ohhscqid are just ID variables.
string out2 (a200).
compute out2=concat('copy "', ltrim(rtrim(dir5)),"\", tiff5, '"',' "c:\temp\temp.tiff"').
write outfile='E:\Data\Outcome.bat'/'#echo SCQ ID ' ohhscqid .
write outfile='E:\Data\Outcome.bat'/out2.
write outfile='E:\Data\Outcome.bat'/'#"C:\Program Files\Microsoft Office\Office14\OIS.exe" "c:\temp\temp.tiff"'.
execute.
I did the homework and figured out one way that works as I want it to. Not the optimal way programmingly though, but the idea is like this.
data batwide;set have;
echo = '#echo SCQ ID '||ohhscqid;
predir = 'copy '||'"'||strip(dir5)||strip('\')||strip(tiff5)||strip('"');
preexec = '#'||strip('"')||strip('C:\Program Files\Microsoft Office\Office14\OIS.exe')||strip('"');
temp = '"'||strip('c:\temp\temp.tiff')||strip('"');
run;
data batwide; set batwide;
dir = catx(' ',predir,temp);
exec = catx(' ',preexec,temp);
run;
data batlong;set batwide;
format bat $200.;
bat = echo;output;
bat = dir;output;
bat = exec;output;
keep bat;
run;
data _null_;
set batlong;
file "E:\SAS codes and files\batchfile.bat";
put bat;
run;
Sounds like you are asking how to generate a series of OS commands into a text file? You can use a DATA step for that.
If you want to test if the specified files exist then use the FILEEXIST() function.
So if you have SAS dataset name HAVE with a variable named VAR1 that contains the filename then you probably want a program like this:
data _null_;
set have ;
file 'E:\Data\Outcome.bat';
if fileexist(VAR1) then do;
target=catx('\','D:\Project\Data\Temp images',scan(VAR1,-1,'\'));
put 'copy ' VAR1 :$quote. target :$quote. ;
put '"C:\Program Files\Microsoft Office\Office14\OIS.exe" ' target :$quote.;
end;
else putlog 'WARNING: File not found. ' VAR1=;
run;
I don't know SPSS, but will give you an example using unix commands, you can change them to Windows commands and probably do what you described.
In this example I'll only copy some files, but the logic to "open the files one by one" is the same. You will have to play with the code and adjust it to Windows.
First of all, we looking for csv files inside the /home/user directory. Again, adjust the command to windows.
This will create a sas dataset with all the files
filename dirlist pipe "find /home/user/ | grep csv";
data dirlist ;
infile dirlist lrecl=200 truncover;
input line $200.;
file_name = strip(line);
keep file_name;
run;
Then I'll create a macro variable with the file count, I'll all it cntfiles
proc sql noprint;
select count(*) into: cntfiles from dirlist;
quit;
%let cntfiles=&cntfiles;
%put cntfiles=&cntfiles;
The last thing I'm doing is, I'm looping, getting the filenames one by one and copying them to a new macro variable called &copyto
This data step (null) will only copy the files, if you want to do something else with them, you'll have to write the code for it.
%macro process_files;
%let copyto = /home/des/33889897/copyto;
%do i=1 %to &cntfiles;
data _null_;
set dirlist (firstobs=&i.);
put file_name=;
call system("cp -f " || file_name || " &copyto");
stop;
run;
%end;
%mend process_files;
%process_files;
Take a look at this link, maybe it can help you.
sample code I use frequently to parse list of files in a directory and extract metadata from the file names. Often feeds a step to generate a sequence of macro variables to use in a macro loop to process each file in turn. Just add any substringing of filenames to extract structured content as with the datetxt and date assigment statements in the example where the filename has a datestamp in it that I want to use.
%let extension=txt;
filename infiles "c:\a\b\c";
Data List_of_files
Not_ext
;
Length
path $255
filename $255
extension $10
;
d_open=dopen("infiles") ;
path=pathname("infiles") ;
nfiles=dnum(d_open) ;
do i=1 to nfiles;
filename =dread(d_open,i) ;
extension=scan(filename,-1,'.') ;
datetxt=scan(filename,2,"_");
date=input(scan(filename,2,"_"),date9.);
if upcase(extension) ne "%upcase(&extension)"
then output Not_ext ;
else output list_of_files ;
end;
d_open=dclose(d_open) ;
keep
filename
path
extension
;
Run ;
filename infiles clear;

Extract "dynamic" part from SAS data-set

I am unsure if this is possible (or stupid question), as I just started looking at SAS last week. I've managed to import my .CSV file to a SAS data set using the:
proc import
Specifying the guessingrows= to limit my out=.
My problem is now that my CSV files to import are not of same structure, which I noticed after writing some code using the obsnum= to specify start and x-lines to read.
So my question is wether or not SAS is capable of either look for a specific string/empty variable, and use as end observation?
My Data looks like (but number of Var_x varies for each file):
First I tried looking at the slice= but is only useful if I know the exact Places of interest, as the empty Space between the Groups can vary.
Is it possible to use the set function to specify to start at line 1 and read till encounting a blank field? Or can you redirect me to some function (that I couldn't find myself)?
I would like to look at each "block" separately and process.
Thank you in advance
I think you can do this in a relatively straightforward way if you are comfortable doing some processing after all the data has been inputted.
So do proc import on the whole dataset with no restriction.
Then use a data step and a counter to process through the data and output as necessary. Something like:
data output1 output2 output3;
set imported_data;
if _n_ = 1 then counter = 1;
var1lag = lag(var1);
if var1 = '' and var1lag ne '' then counter=counter+1;
if counter = 1 then output output1;
else if counter = 2 then output output2;
else output output3;
run;
data output1;
set output1;
if var1 = '' and var2 = . and var3 = . then delete;
run;
data output2;
set output2;
if var1 = '' and var2 = . and var3 = . then delete;
run;
data output3;
set output3;
if var1 = '' and var2 = . and var3 = . then delete;
run;
The above code outputs to three datasets based on the value of counter. The lag function lets us look up a row to ensure its the first time we see no data and updates the counter as we see no data.
Then we go back and remove any fully blank data for our datasets.
You could easily use some arrays to make this work more scaleably if you have many outputs instead of the if/else statements to output the data.