How to use proc format with regexp option? - regex

I stumbled upon there is a regexp option in proc format, so I give a try on this and get fuzzled finally.
proc format;
invalue test
'/n\(.*\)/i'(regexp) = 1
;
run;
data _null_;
x = 'n(ADT,TRTDT)';
y = input(x,test.);
z = prxmatch('/n\(.*\)/i',x)^=0;
put y = z = ;
run;
I had thought that the regexp option is equal to prxmatch() in data step, but the truth is I am wrong.
NOTE: Invalid argument to function INPUT at row 466 column 9.
y=. z=1
x=n(ADT,TRTDT) y=. z=1 _ERROR_=1 _N_=1
I have searched on help documentation and get nothing really help.
How does the option regexp in proc format works? Feel free to share your opinoin, thanks.

You defined an informat with a default width of 10 and tried to read a string of length 11.
data _null_;
x = 'n(ADT,TRTDT)';
y1 = input(x,??test.);
y2 = input(x,??test20.);
z = prxmatch('/n\(.*\)/i',x)^=0;
put (_all_) (=);
run;
Results:
x=n(ADT,TRTDT) y1=. y2=1 z=1
You can add the DEFAULT= option to the INVALUE statement to change the default width.
proc format;
invalue test (default=40)
'/n\(.*\)/i'(regexp) = 1
;
run;

Related

SAS function to every observaton (finance xirr)

I have an sql table like this one
id | payment | date |
______|_____________|________________________|
obs1 | -20,10,13 | 21184,22765,22704 |
And so on (1M+ observation). I prepeared all the data for using finance() in SQL, so in SAS i just need to take them and pass to the function. I am confident, that the data i prepared will return right answer
The problem is that i can't find the most proper way to do caclulate the function on entire data. Right now i am going row by row in cycle and passing data to macro variables throught proc sql BUT i can't get string larger than 1000 characters, so my program isn't working.
I am running next function:
finance('XIRR', payment, date, 0.15);
Can you help me please? Thanks
The code i had before the answer. Worked unacceptable long!
%macro eir (input_data, cash_var, dt_var, output_data);
data rawdata;
set &input_data(dbmax_text=32000);
run;
proc sql noprint;
select count(*) into :n from rawdata ;
quit;
%let n = 100;
%do j=1 %to &n;
data x;
set rawdata(firstobs = &j obs= &j);
run;
proc sql noprint;
select &cash_var into: cf from x;
select &dt_var into: dt from x;
quit;
data x;
set x;
r= finance('xirr', &cf, &dt, 0.15);
drop &cash_var &dt_var;
run;
data out;
set %if &j>1 %then %do; out %end; x;
run;
%end;
proc append base = &output_data data=out;
run;
proc datasets nolist;
delete x out rawdata;
run;
%mend eir;
%eir(input_data = have, cash_var = pmt, dt_var = dt, output_data = ggg);
Took 20 minutes to calculate 50,000 rows
and now it's just
data want;
set have(dbmax_text=32000);
eir = input(resolve(catx(',','%sysfunc(finance(XIRR',pmt,dt,'0.15),hex16)')),hex16.);
run;
Took 6 minutes to calcuate 1,400,000 rows
Tom just saved our project =)
The FINANCE() function wants a list of values, not a character string. You could parse the string and convert the text back into numbers and pass those to the function. But if the lengths of the lists vary from observation to observation that will cause issues.
You could use the macro processor to help you. You can generate a call to %sysfunc(finance()) and read the generated string back into a numeric variable.
It also might work to pad the short lists with zero payments on the last recorded date.
Let's make some test data.
data have ;
infile cards dsd dlm='|' ;
length id $20 payment date $100 ;
input id payment date;
cards;
obs1 | -20,10,13 | 21184,22765,22704
obs2 | -20,10 | 21184,22765
;
Now let's try converting it two ways. One by creating numeric variables to pass to the FINANCE() function call and the other by generating %sysfunc(finance()) call so that we can make sure the %sysfunc() call is working properly.
data want;
set have ;
array v (3) _temporary_;
array d (3) _temporary_;
do i=1 to dim(v);
v(i)=coalesce(input(scan(payment,i,','),32.),0);
d(i)=input(scan(date,i,','),32.);
if missing(d(i)) and i>1 then d(i)=d(i-1);
end;
drop i;
value1=finance('XIRR',of v(*),of d(*),0.15);
value2=input(resolve(catx(',','%sysfunc(finance(XIRR',payment,date,'0.15),hex16)')),hex16.);
run;
Here's my best guess based on the limited details you've provided. I think you need to split out each date and payment into separate variables before you can call the finance function, e.g.:
data have;
infile datalines dlm='|';
input id :$8. amount :$20. date :$20.;
datalines;
obs1 | -20,10,13 | 21184,22765,22704
;
run;
data want;
set have;
array dates[3] d1-d3;
array amounts[3] a1-a3;
do i = 1 to 3;
amounts[i] = input(scan(amount, i, ','), 8.);
dates[i] = input(scan(date, i, ','), 8.);
end;
XIRR = finance('XIRR', of a1-a3, of d1-d3, 0.15);
run;
I suspect this will only work you have the same number of dates and payments in every row, otherwise you will run into array out of bounds issues or problems with the IRR calculation.

SAS proc format statement

I want to create a format on a numeric variable (say, age) to see the result as ">10". I tried as:
PROC FORMAT;
VALUE agefmt
>10 - high = '> 10' /*10 to be excluded.*/
other = '<= 10'
;
RUN;
But it does not work. Please help.
You made just a small mistake, the > must be < and between the values:
PROC FORMAT;
VALUE agefmt
10 <- high = '> 10' /*10 to be excluded.*/
other = '<= 10'
;
RUN;

Estimating a response value based on known parameters

SAS newbie here.
My question is about PROC REG in SAS; let's assume I have already created a model and now I would like to use this model, and known predictor variables to estimate a response value.
Is there a clean and easy way of doing this in SAS? So far I've been manually grabbing the intercept and the coefficients from the output of my model to calculate the response variable but as you can imagine it can get pretty nasty when you have a lot of covariates. Their user's guide is pretty cryptic...
Thanks in advance.
#Reese is correct. Here is some sample code to get you up the learning curve faster:
/*Data to regress*/
data test;
do i=1 to 100;
x1 = rannor(123);
x2 = rannor(123)*2 + 1;
y = 1*x1 + 2*x2 + 4*rannor(123);
output;
end;
run;
/*Data to score*/
data to_score;
_model_ = "Y_on_X";
y = .;
x1 = 1.5;
x2 = -1;
run;
/*Method 1: just put missing values on the input data set and
PROC REG will do it for you*/
data test_2;
set test to_score;
run;
proc reg data=test_2 alpha=.01 outest=est;
Y_on_X: model y = x1 x2;
output out=test2_out(where=(y=.)) p=predicted ucl=UCL_Pred lcl=LCL_Pred;
run;
quit;
proc print data=test2_out;
run;
/*Method 2: Use the coefficients and the to_score data with
PROC SCORE*/
proc score data=to_score score=est out=scored type=parms;
var x1 x2;
run;
proc print data=scored;
var Y_on_X X1 X2;
run;
2 ways:
Append the data you want into the data set you're going to use to get estimates but leave the y value blank. Grab the estimates using the output statement from proc reg.
Use Proc Score
http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_score_sect018.htm

do loop on sas but not with a macro

It is a simple one but I'm a struggling a bit.
What I have :
What I want :
I want to remove the v0 , v1 and etc.
I'm using this piece of code
data IndieDay20140704;
set IndieDay20140704;
do i=1 to 5;
VAR1=tranwrd(var1,"v&i","");
end;
run;
It is not working correctly as it is giving me this instead (see below) plus the error
WARNING: Apparent symbolic reference I not resolved.
Questions:
1) Do I need a macro?
2) Why the error?
Many thanks for your insights.
There's an error because you're (unintentionally) using macro variable i, that you did not initialize.
I guess the idea of tranwrd is to remove words in VAR2, VAR3.. from VAR1.
The logical error is to do it also for VAR1 itself.
Check if this helps (using array):
data IndieDay20140704;
length VAR1 VAR2 VAR3 VAR3 VAR5 $10;
VAR1 = 'TEST IT';VAR5 = 'TEST';
output;
VAR1 = 'STEST IT';VAR5 = 'TEST';
output;
run;
data IndieDay20140704_modified / view= IndieDay20140704_modified;
set IndieDay20140704;
array vals VAR1 - VAR5;
do i=1 to dim(vals);
if i ne 1 then VAR1=tranwrd(var1,trim(vals(i)),"");
end;
drop i;
run;
Here I'm creating a SAS view on top of table (not a good idea to overwrite the source).
Also I think you should trim() the values from VAR2,VAR3... depending on what you want to achieve and what's in the data.
EDIT:
here the version with 'v0', 'v1'...'v5' strings:
data IndieDay20140704;
length VAR1$10;
VAR1 = 'TEST v0';
output;
VAR1 = 'TEST v11';
output;
VAR1 = 'TEST v1';
output;
run;
data IndieDay20140704_modified / view= IndieDay20140704_modified;
set IndieDay20140704;
org_var1 = var1;
do i=0 to 5;
var1 =tranwrd(var1, catt('v', put(i, 1. -L)),"");
end;
run;
catt('v', put(i, 1. -L)) concatenates string 'v' and the result of put.
put(i, 1. -L)) converts numeric variable i to text using plain numeric format w.d, 1. used here - enough for single digit numbers, -L left aligns the result
Here's one way, there are many others and this may not work if your data has a lot of variability.
data have;
length VAR1$10;
VAR1 = 'fic19v0.csv';
output;
VAR1 = 'fic19v1.cs';
output;
run;
data want ;
set have;
original_var=var1;
var1=substr(var1, 1, index(var1, ".")-3)||".csv";
run;

Quotation mark SAS (+) PROC FORMAT value|invalue

I'm still stucked with SAS special characters treatment.
%macro mFormat();
%do i=1 %to &numVar. ;
proc format library = work ;
invalue $ inf&&nomVar&i..s
%do j=1 %to &&numMod&i.;
"%superq(tb&i.mod&j.)" = &j.
%end;
;
run;
proc format library = work ;
value f&&nomVar&i..s
%do k=1 %to &&numMod&i.;
&k. = "%superq(tb&i.mod&k.)"
%end;
;
run;
%end;
%mend mFormat;
%mFormat();
As you can see, the program supposes to create the format and the informats for each variable. My only problem is when the variable name resolves to Brand which contains
GOTAN-GOTAN
FRANCES-FRANCES
+&DECO-+DECO&
etc ...
These names leads me to this error
“ERROR: This range is repeated, or values overlap:”
I hope I can force SAS to read those names. Or perhaps, this is not the best approach to generate FORMATS and INFORMATS for variables that contain these characters( &, %, -, ', ").
Because your macro is using so many global macro variables, it's hard to see the problem. That error message indicates that your macro is genenerating duplicate ranges to PROC FORMAT. The complete error message should tell you which range is in error; if that is all you see, my guess is that more than more of your macro variables resolves to a blank.
There is no restriction on using hypens when defining PROC FORMAT ranges. I made up this little example to illustrate:
proc format library = work ;
invalue infs
'GOTAN-GOTAN' = 1
'FRANCES-FRANCES' = 2
'+&DECO-+DECO&' = 3;
value fs
1 = 'GOTAN-GOTAN'
2 = 'FRANCES-FRANCES'
3 = '+&DECO-+DECO&';
run;
data a;
test = 'FRANCES-FRANCES';
in_test = input(test,infs.);
put test= in_test= in_test= fs.;
run;
Although you may find some trick to solve your macro problem, I'd suggest you toss that out and use the CNTLIN option of PROC FORMAT to use a data set to create your custom formats and informats. That would certainly make things easier to maintain and might also help create some useful metadata for your project. Here is a simple example to create the same format and informat as above:
data fmt_defs;
length fmtname start label $32 type $1;
fmtname = 'INFS';
type = 'I';
start = 'GOTAN-GOTAN'; label = '1'; output;
start = 'FRANCES-FRANCES'; label = '2'; output;
start = '+&DECO-+DECO&'; label = '3'; output;
fmtname = 'FS';
type = 'N';
start = '1'; label='GOTAN-GOTAN'; output;
start = '2'; label='FRANCES-FRANCES'; output;
start = '3'; label='+&DECO-+DECO&'; output;
run;
proc format library = work cntLin=fmt_defs;
run;
You can find much more information about PROC FORMAT in the online documentation.
Good luck,
Bob
I think the hypen is the problem for the samples you provided. Maybe you could use a character replacement function to TRANSLATE the hyphen (or other problem characters) to something else like a space or underscore.
%Let Test=One-Two;
%Put &test;
%Let Test=%sysfunc(translate(&test,%str(_),%str(-)));
%Put &test;