data driven put with different formats challenge - sas

I wanted to come here with this challenge I'm facing in these days.
Basically for each record, a different format should be used in a put statement, and it is defined in the data in theirselves.
The challenge is not to split the datasteps and gaining the wanted result within the datastep, so avoid obvious %do loops and similar :)
proc format;
value $a 'FRS'='FIRST';
value $b 'SCN'='SECOND';
run;
data a;
length var $3 res $10 fmt $5;
var='FRS'; fmt='$a.'; res=''; output;
var='SCN'; fmt='$b.'; res=''; output;
run;
data b;
set a;
*your code goes here, the result should go into res variable;
*and should be the "putted" value of var using fmt as format.;
*an obviously non working version can be found here below;
res=put(var,fmt);
run;
Here below what it looks like, res is the expected result:
VAR FMT RES
---------------
"FRS" $a. =put("FRS",$a.)="FIRST"
"SCN" $b. =put("SCN",$b.)="SECOND"

I am not sure I understand the question, but it looks like you just want to use the PUTC() function. If your variable was numeric you would use the PUTN() function instead.
res=putc(var,fmt);

Related

Using SAS finance XIRR function on multiple rows with different numbers of variables

I am trying to use the SAS XIRR function on a dataset. The syntax is:
finance('XIRR',value1, value2, value3...valuen,date1,date2,date3...daten);
My problem is that the data has different numbers of values/dates on each row. There could be up to 122 values/dates per row.
Where there are missing values the XIRR function fails, so I set all missing values to 0. Now the function fails as the 'missing' dates are now Jan1960. Anyone got any ideas?
in the code below cf1-cf122 are the cash flow values and ed1-ed122 are the dates.
/* remove blanks */
data irrtable3;
set irrtable2;
array change _numeric_;
do over change;
if change=. then change=0;
end;
run;
/* create irr */
data irrtable4;
set irrtable3;
IRR=finance('XIRR',OF CF1-CF122,OF ED1-ED122);
run;```
You can use codegen to construct a dynamic FINANCE(..) call, with a variable number of arguments, that is resolved by the macro system at DATA step run-time.
Using RESOLVE to compute the result in macro environment for many, many rows will likely have a noticeable slowness compared to plain DATA step.
Example:
data have;
v1=−10000; d1=mdy(1, 1, 2008);
v2=2750; d2=mdy(3, 1, 2008);
v3=4250; d3=mdy(10, 30, 2008);
v4=3250; d4=mdy(2, 15, 2009);
v5=2750; d5=mdy(4, 1, 2009);
output;
call missing(v5,d5); output;
call missing(v4,d4); output;
call missing(v3,d3); output;
call missing(v2,d2); output;
run;
options missing=' ';
data want;
set have;
args = catx(',', of v1-v5, of d1-d5);
result = resolve( cats (
'%sysfunc(FINANCE(XIRR,', args, '))'
));
run;
options missing='.';
From what I can tell (And I don't work with Finance functions, so I'm not an expert), if you have all of the 'filled' arguments prior to the 'unfilled', you are okay to just set everything to zero that's missing (both on the 'value' and 'date' side). Using the example Richard provides (which is the one from the SAS documentation):
data want2;
set have;
array v v1-v5;
array d d1-d5;
do _i_ = 1 to dim(v);
if missing(v[_i_]) then do;
v[_i_]=0; d[_i_]=0;
end;
end;
args = catx(',', of v1-v5, of d1-d5);
result =FINANCE('XIRR',of v1-v5, of d1-d5);
run;
That works and gets the same result as Richard's, and is probably faster.
This does require the 0s to all be at the end - if they're interspersed, and you can't use CALL SORTN to get them put all on one end - and your data is too big to use with RESOLVE, then I would construct this entirely in the macro language. You could do a few things, all of which are too long for this answer, but the simplest is probably to create code for every line, and put them behind if _n_ = 5 then do; &row5code.; end; for each row. This would be very long, certainly, but should be faster than the resolve (just a lot less maintainable). You could also do a CALL EXECUTE for each line, also slow but a possibility, or even DOSUBL.

In SAS compute an md5 hash for whole file given you have an md5 hash for each record

This is a follow-up to my recent question on calculating md5 hash in SAS and python. So, I'm using SAS v9.2 and there is an md5 hash function which takes in a string and returns a hash. What I'd really like though is a way to compute the hash for the file as a whole. Given that I have a hash for each record , is there any way to do this and have the file hash match up with the value obtained by using , say, python code. Taking the sashelp.shoes dataset as an example I exported this to a CSV file and manually removed double quotes and dollars and commas of the currency fields. I then computed the hash for the file as a whole using this python code:
filename = "f:/test/shoes.csv"
md5_hash = hashlib.md5()
with open(filename,"rb") as f:
# Read and update hash string value in blocks of 4K
for byte_block in iter(lambda: f.read(1024*1024),b''):
md5_hash.update(byte_block.replace(b'\r', b'').replace(b'\n', b''))
print(md5_hash.hexdigest())
And got this hash back as output:
f7f205b5b844bf57f5f51685969e0df0
If anyone can replicate this final hash value in SAS for that dataset that would be great.
PS I'm on SAS V9.2
You have two options:
Implement the MD5 algorithm in SAS. I'm aware of existing implementations for SHA and CRC but I'm not sure about MD5.
Call an external utility from SAS to calculate the md5 hash for the file. There is an example here.
My earlier note on limitations applies only when working with DS1. There is no way around the length restriction in DS1. You could try this and you will get an error:
data test;
length x $30000;
x = repeat('-', 30000);
run;
data _null_;
set test;
format m $hex32.;
m = md5(catx(',', x, x));
put m=;
run;`
But Robert Pendridge is correct to point out that DS2 can solve this issue.
%let reclen = 201; /* Length of each record */
%let records = 2000; /* Number of records */
%let totlen = %eval(&reclen * &records);
proc ds2;
data _null_;
retain m;
dcl char(&totlen) m;
method run();
dcl char(200) c;
set shoes;
c = catx(',',&varstr2);
m = strip(m)|| strip(c);
end;
method term();
dcl char(32) hh;
hh = put(md5(m), $hex32.);
put hh=;
end;
enddata;
run;
quit;
This is essentially doing what the Python code is doing. The update merely concatenates the strings and applies the hash. You may have to tighten this up a little bit to remove any extraneous spaces etc., but should work.
Unfortunately you cannot in DS1. The reason is that the maximum variable size that SAS allows is only 32,767 bytes long. You could group the variables in multiple variables, but still when you try to concatenate them (even directly when invoking the md5 function), it will end up truncating it. Your best bet is writing the output to an external text file (as shown below based on your previous example) and generating md5sum on it. This is actually just one little extra step.. You could just use the X command to do that from within SAS itself (provided you are configured to do so).
filename ff "contents.txt" TERMSTR=CR;
data _null_;
set shoes end = lastrec;
newvar2 = catx(',',&varstr2);
file ff;
put newvar2;
run;

SAS call symput in data step

I am facing this issue with sas data step. My requirement is to get a list of variables such as
total_jun2018 = sum(jun2018, dep_jun2018);
total_jul2018 = sum(jul2018, dep_jul2018);
Data final4;
set final3;
by hh_no;
do i=0 to &tot_bal_mnth.;
bal_mnth = put(intnx('month',"&min_Completed_dt."d, i-1), monyy7.);
call symputx('bal_mnth', bal_mnth);
&bal_mnth._total=sum(&bal_mnth., Dep_&bal_mnth.);
output;
end;
But I am facing error that macro variable bal_mnth not resolved. Also once it did ran successfully but I want that output must be printed sequentially but it only prints output for last loop when i=6 then it prints only Total_DEC2018=sum(DEC2018, DEP_DEC2018);
Any help will be appreciated!
Thanks,
Ajay
This is a common issue when learning SAS Macro. The problem is that the macro processor needs to resolve &bal_mnth to a value when the data step is first submitted for execution, but the CALL SYMPUT doesn't execute until the data step is actually executed, so at the time you submit the code, there is no value available for &bal_mnth.
In this case you don't need bal_mnth to be created as a variable in the data set, so you could replace the line that starts bal_mnth = put(intck(...)) with a %let bal_mnth = ... statement. The %let executes while the data step is being submitted, so that way its value will be available when you need it.
My proposed %let statement will need to wrap the functions in at least one SYSFUNC call, which is left as an exercise for the reader :-)
It looks like you want to generate a series of assignment statements like:
total_jun2018 = sum(jun2018, dep_jun2018);
total_jul2018 = sum(jul2018, dep_jul2018);
...
total_jan2019 = sum(jan2019, dep_jan2019);
What is known as wallpaper code.
If your variables names were easier, such as dep1 to dep18 then it would be easy to use arrays to process the data. With your current naming convention the problem with generating the array statements is not much different than the problem of generating a series of assignment statements.
You can create a macro so that you could use a %DO loop to generate your wallpaper code.
%local i bal_mnth;
%do i=0 %to &tot_bal_mnth.;
%let bal_mnth = %sysfunc(intnx(month,"&min_Completed_dt."d, &i-1), monyy7.);
total_&bal_mnth = sum(&bal_mnth , Dep_&bal_mnth );
%end;
Or you could just generate the code to a file with a data step.
%let tot_bal_mnth = 7;
%let min_Completed_dt=01JUN2018;
filename code temp;
data _null_;
file code;
length bal_mnth $7 ;
do i=0 to &tot_bal_mnth.;
bal_mnth = put(intnx('month',"&min_Completed_dt."d, i-1), monyy7.);
put 'total_' bal_mnth $7. ' = sum(' bal_mnth $7. ', Dep_' bal_mnth $7. ');';
end;
run;
So the generated file of code looks like this:
total_MAY2018 = sum(MAY2018, Dep_MAY2018);
total_JUN2018 = sum(JUN2018, Dep_JUN2018);
total_JUL2018 = sum(JUL2018, Dep_JUL2018);
total_AUG2018 = sum(AUG2018, Dep_AUG2018);
total_SEP2018 = sum(SEP2018, Dep_SEP2018);
total_OCT2018 = sum(OCT2018, Dep_OCT2018);
total_NOV2018 = sum(NOV2018, Dep_NOV2018);
total_DEC2018 = sum(DEC2018, Dep_DEC2018);
You can then use %include to run it in your data step.
data final4;
set final3;
by hh_no;
%include code / source2 ;
run;
I would like to offer another point of view: the difficulty you are having here results from the use of a wide data shape, with lots of columns.
Rather than working with your data in this shape, you could first transpose from wide to long, so that instead of having lots of total_xxx columns you just have 3: total, total_dep and date, with one row per month. Once it's in this format, it will be much easier to work with, potentially allowing you to avoid resorting to macros and wallpaper code.
Suggested reading:
Transpose wide to long with dynamic variables

SAS format char

First i have created this table
data rmlib.tableXML;
input XMLCol1 $ 1-10 XMLCol2 $ 11-20 XMLCol3 $ 21-30 XMLCol4 $ 31-40 XMLCol5 $ 41-50 XMLCol6 $ 51-60;
datalines;
| AAAAA A||AABAAAAA|| BAAAAA|| AAAAAA||AAAAAAA ||AAAA |
;
run;
I want to clean, concatenate and export. I have written the following code
data rmlib.tableXML_LARGO;
file CleanXML lrecl=90000;
set rmlib.tableXML;
array XMLCol{6} ;
array bits{6};
array sqlvars{6};
do i = 1 to 6;
*bits{i}=%largo(XMLCol{i})-2;
%let bits =input(%largo(XMLCol{i})-2,comma16.5);
sqlvars{i} = substr(XMLCol{i},2,&bits.);
put sqlvars{i} &char10.. #;
end;
run;
the macro largo count how many characters i have
%macro largo(num);
length(put(&num.,32500.))
%mend;
What i need is instead of have char10, i would like that this number(10) would be the length, of each string, so to have something like
put sqlvars{i} &char&bits.. #;
I don't know if it possible but i can't do it.
I would like to see something like
AAAAA AAABAAAAA BAAAAA AAAAAAAAAAAAA AAAA
It is important to me to keep the spaces(this is only an example of an extract of a xml extract). In addition I will change (for example) "B" for "XPM", so the size will change after cleaning the text, that it what i need to be flexible in the char
Thank you for your time
Julen
I'm still not quite sure what you want to achieve, but if you want to combine the text from multiple varriables into one variable, then you could do something along the lines:
proc sql;
select name into :names separated by '||'
from dictionary.columns
where 1=1
and upcase(libname)='YOURLIBNAME'
and upcase(memname)='YOURTABLENAME';
quit;
data work.testing;
length resultvar $ 32000;
set YOURLIBNAME.YOURTABLENAME;
resultvar = &names;
resultvar2 = compress(resultvar,'|');
run;
Wasn't able to test this, but this should work if you replace YOURLIBNAME and YOURTABLENAME with your respective tables. I'm not 100% sure if the compress will preserve the spaces in the text.. But I think it should.
The format $VARYING. <length-variable> is a good candidate for solving this output problem.
On the presumption of having a number of variables whose values are vertical-bar bounded and wanting to output to a file the concatenation of the values without the bounding bars.
data have;
file "c:\temp\want.txt" lrecl=9000;
length xmlcol1-xmlcol6 $100;
array values xmlcol1-xmlcol6 ;
xmlcol1 = '| A |';
xmlcol2 = '|A BB|';
xmlcol3 = '|A BB|';
xmlcol4 = '|A BBXC|';
xmlcol5 = '|DD |';
xmlcol6 = '| ZZZ |';
do index = 1 to dim(values);
value = substr(values[index], 2); * ignore presumed opening vertical bar;
value_length = length(value)-1; * length with still presumed closing vertical bar excluded;
put value $varying. value_length #; * send to file the value excluding the presumed closing vertical bar;
end;
run;
You have some coding errors in that is making it difficult to understand what you want to do.
Your %largo() macro doesn't make any sense. There is no format 32500.. The only reason it would run in your code is because you are trying to apply the format to a character variable instead of a number. So SAS will automatically convert to use the $32500. instead.
The %LET statement that you have hidden in the middle of your data step will execute BEFORE the data step runs. So it would be less confusing to move it before the data step.
So replacing the call to %largo() your macro variable BITS will contain this text.
%let bits =input(length(put(XMLCol{i},32500.))-2,comma16.5);
Which you then use inside a line of code. So that line will end up being this SAS code.
sqlvars{i} = substr(XMLCol{i},2,input(length(put(XMLCol{i},$32500.))-2,comma16.5));
Which seems to me to be a really roundabout way to do this:
sqlvars{i} = substr(XMLCol{i},2,length(XMLCol{i})-2);
Since SAS stores character variables as fixed length, it will pad the value stored. So what you need to do is to remember the length so that you can use it later when you write out the value. So perhaps you should just create another array of numeric variables where you can store the lengths.
sqllen{i} = length(XMLCol{i})-2;
sqlvars{i} = substr(XMLCol{i},2,sqllen{i});

Extract "dynamic" part from SAS data-set

I am unsure if this is possible (or stupid question), as I just started looking at SAS last week. I've managed to import my .CSV file to a SAS data set using the:
proc import
Specifying the guessingrows= to limit my out=.
My problem is now that my CSV files to import are not of same structure, which I noticed after writing some code using the obsnum= to specify start and x-lines to read.
So my question is wether or not SAS is capable of either look for a specific string/empty variable, and use as end observation?
My Data looks like (but number of Var_x varies for each file):
First I tried looking at the slice= but is only useful if I know the exact Places of interest, as the empty Space between the Groups can vary.
Is it possible to use the set function to specify to start at line 1 and read till encounting a blank field? Or can you redirect me to some function (that I couldn't find myself)?
I would like to look at each "block" separately and process.
Thank you in advance
I think you can do this in a relatively straightforward way if you are comfortable doing some processing after all the data has been inputted.
So do proc import on the whole dataset with no restriction.
Then use a data step and a counter to process through the data and output as necessary. Something like:
data output1 output2 output3;
set imported_data;
if _n_ = 1 then counter = 1;
var1lag = lag(var1);
if var1 = '' and var1lag ne '' then counter=counter+1;
if counter = 1 then output output1;
else if counter = 2 then output output2;
else output output3;
run;
data output1;
set output1;
if var1 = '' and var2 = . and var3 = . then delete;
run;
data output2;
set output2;
if var1 = '' and var2 = . and var3 = . then delete;
run;
data output3;
set output3;
if var1 = '' and var2 = . and var3 = . then delete;
run;
The above code outputs to three datasets based on the value of counter. The lag function lets us look up a row to ensure its the first time we see no data and updates the counter as we see no data.
Then we go back and remove any fully blank data for our datasets.
You could easily use some arrays to make this work more scaleably if you have many outputs instead of the if/else statements to output the data.