Populate SAS variable based on content of another variable - sas

I have a variable, textvar, that looks like this:
type=1&name=bob
type=2&name=sue
I want to create a new table that looks like this:
type name
1 bob
2 sue
My approach is to use scan to split the variables on & so for the first observation I have
var1 var2
type=1 name=bob
So now I can use scan again to split on =:
vname = scan(var1, 1, '=');
value = scan(var1, 2, '=');
But how can I now assign value to the variable named vname?

PROC TRANPSOSE is the quickest way. You need an ID variable (dummy or real).
data test;
informat testvar $50.;
input testvar $;
datalines;
type=1&name=bob
type=2&name=sue
;;;;
run;
data test_vert;
set test;
id+1;
length scanner $20 vname vvalue $20;
scanner=scan(testvar,1,"&");
do _t=2 by 1 until (scanner=' ');
vname=scan(scanner,1,"=");
vvalue=scan(scanner,2,"=");
output;
scanner=scan(testvar,_t,"&");
end;
run;
proc transpose data=test_vert out=test_T;
by id;
id vname;
var vvalue;
run;

Does this help? Dynamic variable names in SAS
I think I have some code to address this, but left it at my workplace.

Obviously you haven't included your real data, but can't you just hard code some of the values if the format of the raw data is the same in each row? My code converts the "=" and "&" to "," to make the scan function easier to use.
data want (keep=type name);
set test;
_newvar=translate(testvar,",,","&=");
type=input(scan(_newvar,2),best12.);
length name $20;
name=scan(_newvar,4);
run;

Related

SaS 9.4: How to use different weights on the same variable without datastep or proc sql

I can't find a way to summarize the same variable using different weights.
I try to explain it with an example (of 3 records):
data pippo;
a=10;
wgt1=0.5;
wgt2=1;
wgt3=0;
output;
a=3;
wgt1=0;
wgt2=0;
wgt3=1;
output;
a=8.9;
wgt1=1.2;
wgt2=0.3;
wgt3=0.1;
output;
run;
I tried the following:
proc summary data=pippo missing nway;
var a /weight=wgt1;
var a /weight=wgt2;
var a /weight=wgt3;
output out=pluto (drop=_freq_ _type_) sum()=;
run;
Obviously it gives me a warning because I used the same variable "a" (I can't rename it!).
I've to save a huge amount of data and not so much physical space and I should construct like 120 field (a0-a6,b0-b6 etc) that are the same variables just with fixed weight (wgt0-wgt5).
I want to store a dataset with 20 columns (a,b,c..) and 6 weight (wgt0-wgt5) and, on demand, processing a "summary" without an intermediate datastep that oblige me to create 120 fields.
Due to the huge amount of data (more or less 55Gb every month) I'd like also not to use proc sql statement:
proc sql;
create table pluto
as select sum(db.a * wgt1) as a0, sum(db.a * wgt1) as a1 , etc.
quit;
There is a "Super proc summary" that can summarize the same field with different weights?
Thanks in advance,
Paolo
I think there are a few options. One is the data step view that data_null_ mentions. Another is just running the proc summary however many times you have weights, and either using ods output with the persist=proc or 20 output datasets and then setting them together.
A third option, though, is to roll your own summarization. This is advantageous in that it only sees the data once - so it's faster. It's disadvantageous in that there's a bit of work involved and it's more complicated.
Here's an example of doing this with sashelp.baseball. In your actual case you'll want to use code to generate the array reference for the variables, and possibly for the weights, if they're not easily creatable using a variable list or similar. This assumes you have no CLASS variable, but it's easy to add that into the key if you do have a single (set of) class variable(s) that you want NWAY combinations of only.
data test;
set sashelp.baseball;
array w[5];
do _i = 1 to dim(w);
w[_i] = rand('Uniform')*100+50;
end;
output;
run;
data want;
set test end=eof;
i = .;
length varname $32;
sumval = 0 ;
sum=0;
if _n_ eq 1 then do;
declare hash h_summary(suminc:'sumval',keysum:'sum',ordered:'a');;
h_summary.defineKey('i','varname'); *also would use any CLASS variable in the key;
h_summary.defineData('i','varname'); *also would include any CLASS variable in the key;
h_summary.defineDone();
end;
array w[5]; *if weights are not named in easy fashion like this generate this with code;
array vars[*] nHits nHome nRuns; *generate this with code for the real dataset;
do i = 1 to dim(w);
do j = 1 to dim(vars);
varname = vname(vars[j]);
sumval = vars[j]*w[i];
rc = h_summary.ref();
if i=1 then put varname= sumval= vars[j]= w[i]=;
end;
end;
if eof then do;
rc = h_summary.output(dataset:'summary_output');
end;
run;
One other thing to mention though... if you're doing this because you're doing something like jackknife variance estimation or that sort of thing, or anything that uses replicate weights, consider using PROC SURVEYMEANS which can handle replicate weights for you.
You can SCORE your data set using a customized SCORE data set that you can generate
with a data step.
options center=0;
data pippo;
retain a 10 b 1.75 c 5 d 3 e 32;
run;
data score;
if 0 then set pippo;
array v[*] _numeric_;
retain _TYPE_ 'SCORE';
length _name_ $32;
array wt[3] _temporary_ (.5 1 .333);
do i = 1 to dim(v);
call missing(of v[*]);
do j = 1 to dim(wt);
_name_ = catx('_',vname(v[i]),'WGT',j);
v[i] = wt[j];
output;
end;
end;
drop i j;
run;
proc print;[enter image description here][1]
run;
proc score data=pippo score=score;
id a--e;
var a--e;
run;
proc print;
run;
proc means stackods sum;
ods exclude summary;
ods output summary=summary;
run;
proc print;
run;
enter image description here

modifying character variable contents based on lookup table in SAS

HAVE is a wide dataset with names stored in the variables name1-name250. Here are the first two obs and several vars:
episode name1 name2 name3 name4 name5 ...
121 DETWEILER.TJ.M BLUMBERG.MIKEY GRISWOLD.GUS.N
451 BOB.KING KID.HUSTLER FINSTER.MS PRICKLEY.PETEY GRISWOLD.GUS
...
Some of the names need to be corrected. The corrections are stored in the dataset FIXES:
goodname badname
DETWEILER.TJ DETWEILER.TJ.M
GRISWOLD.GUS GRISWOLD.GUS.N
I simply need to find the badname from FIXES that appear in HAVE and replace them with goodname. I currently loop through name1-name250 in a data step for each row in FIXES to accomplish this:
data WANT;
set HAVE;
array name {*} name1-name250;
do i=1 to dim(name);
if name{i} = "DETWEILER.TJ.M" then name{i} = "DETWEILER.TJ";
else if name{i} = "GRISWOLD.GUS.N" then name{i} = "GRISWOLD.GUS";
/*manually add other corrections from FIXES dataset*/
else name{i} = name{i};
end;
run;
This feels really inefficient. What is a better way?
When you have a simple exact match translation like that a FORMAT is a simple way to implement it. You can convert your "lookup" data into a format.
data fixes ;
input goodname :$30. badname :$30. ;
cards;
DETWEILER.TJ DETWEILER.TJ.M
GRISWOLD.GUS GRISWOLD.GUS.N
;
data format ;
retain fmtname '$FIXNAME' ;
set fixes end=eof;
rename badname=start goodname=label;
run;
proc format cntlin=format;
run;
Then just use the format to convert the names.
data want;
set have;
array name name1-name5;
do over name;
name=put(name,$fixname30.);
end;
run;
Result:
episode name1 name2 name3 name4 name5
121 DETWEILER.TJ BLUMBERG.MIKEY GRISWOLD.GUS
451 BOB.KING KID.HUSTLER FINSTER.MS PRICKLEY.PETEY GRISWOLD.GUS

SAS function to every observaton (finance xirr)

I have an sql table like this one
id | payment | date |
______|_____________|________________________|
obs1 | -20,10,13 | 21184,22765,22704 |
And so on (1M+ observation). I prepeared all the data for using finance() in SQL, so in SAS i just need to take them and pass to the function. I am confident, that the data i prepared will return right answer
The problem is that i can't find the most proper way to do caclulate the function on entire data. Right now i am going row by row in cycle and passing data to macro variables throught proc sql BUT i can't get string larger than 1000 characters, so my program isn't working.
I am running next function:
finance('XIRR', payment, date, 0.15);
Can you help me please? Thanks
The code i had before the answer. Worked unacceptable long!
%macro eir (input_data, cash_var, dt_var, output_data);
data rawdata;
set &input_data(dbmax_text=32000);
run;
proc sql noprint;
select count(*) into :n from rawdata ;
quit;
%let n = 100;
%do j=1 %to &n;
data x;
set rawdata(firstobs = &j obs= &j);
run;
proc sql noprint;
select &cash_var into: cf from x;
select &dt_var into: dt from x;
quit;
data x;
set x;
r= finance('xirr', &cf, &dt, 0.15);
drop &cash_var &dt_var;
run;
data out;
set %if &j>1 %then %do; out %end; x;
run;
%end;
proc append base = &output_data data=out;
run;
proc datasets nolist;
delete x out rawdata;
run;
%mend eir;
%eir(input_data = have, cash_var = pmt, dt_var = dt, output_data = ggg);
Took 20 minutes to calculate 50,000 rows
and now it's just
data want;
set have(dbmax_text=32000);
eir = input(resolve(catx(',','%sysfunc(finance(XIRR',pmt,dt,'0.15),hex16)')),hex16.);
run;
Took 6 minutes to calcuate 1,400,000 rows
Tom just saved our project =)
The FINANCE() function wants a list of values, not a character string. You could parse the string and convert the text back into numbers and pass those to the function. But if the lengths of the lists vary from observation to observation that will cause issues.
You could use the macro processor to help you. You can generate a call to %sysfunc(finance()) and read the generated string back into a numeric variable.
It also might work to pad the short lists with zero payments on the last recorded date.
Let's make some test data.
data have ;
infile cards dsd dlm='|' ;
length id $20 payment date $100 ;
input id payment date;
cards;
obs1 | -20,10,13 | 21184,22765,22704
obs2 | -20,10 | 21184,22765
;
Now let's try converting it two ways. One by creating numeric variables to pass to the FINANCE() function call and the other by generating %sysfunc(finance()) call so that we can make sure the %sysfunc() call is working properly.
data want;
set have ;
array v (3) _temporary_;
array d (3) _temporary_;
do i=1 to dim(v);
v(i)=coalesce(input(scan(payment,i,','),32.),0);
d(i)=input(scan(date,i,','),32.);
if missing(d(i)) and i>1 then d(i)=d(i-1);
end;
drop i;
value1=finance('XIRR',of v(*),of d(*),0.15);
value2=input(resolve(catx(',','%sysfunc(finance(XIRR',payment,date,'0.15),hex16)')),hex16.);
run;
Here's my best guess based on the limited details you've provided. I think you need to split out each date and payment into separate variables before you can call the finance function, e.g.:
data have;
infile datalines dlm='|';
input id :$8. amount :$20. date :$20.;
datalines;
obs1 | -20,10,13 | 21184,22765,22704
;
run;
data want;
set have;
array dates[3] d1-d3;
array amounts[3] a1-a3;
do i = 1 to 3;
amounts[i] = input(scan(amount, i, ','), 8.);
dates[i] = input(scan(date, i, ','), 8.);
end;
XIRR = finance('XIRR', of a1-a3, of d1-d3, 0.15);
run;
I suspect this will only work you have the same number of dates and payments in every row, otherwise you will run into array out of bounds issues or problems with the IRR calculation.

do loop on sas but not with a macro

It is a simple one but I'm a struggling a bit.
What I have :
What I want :
I want to remove the v0 , v1 and etc.
I'm using this piece of code
data IndieDay20140704;
set IndieDay20140704;
do i=1 to 5;
VAR1=tranwrd(var1,"v&i","");
end;
run;
It is not working correctly as it is giving me this instead (see below) plus the error
WARNING: Apparent symbolic reference I not resolved.
Questions:
1) Do I need a macro?
2) Why the error?
Many thanks for your insights.
There's an error because you're (unintentionally) using macro variable i, that you did not initialize.
I guess the idea of tranwrd is to remove words in VAR2, VAR3.. from VAR1.
The logical error is to do it also for VAR1 itself.
Check if this helps (using array):
data IndieDay20140704;
length VAR1 VAR2 VAR3 VAR3 VAR5 $10;
VAR1 = 'TEST IT';VAR5 = 'TEST';
output;
VAR1 = 'STEST IT';VAR5 = 'TEST';
output;
run;
data IndieDay20140704_modified / view= IndieDay20140704_modified;
set IndieDay20140704;
array vals VAR1 - VAR5;
do i=1 to dim(vals);
if i ne 1 then VAR1=tranwrd(var1,trim(vals(i)),"");
end;
drop i;
run;
Here I'm creating a SAS view on top of table (not a good idea to overwrite the source).
Also I think you should trim() the values from VAR2,VAR3... depending on what you want to achieve and what's in the data.
EDIT:
here the version with 'v0', 'v1'...'v5' strings:
data IndieDay20140704;
length VAR1$10;
VAR1 = 'TEST v0';
output;
VAR1 = 'TEST v11';
output;
VAR1 = 'TEST v1';
output;
run;
data IndieDay20140704_modified / view= IndieDay20140704_modified;
set IndieDay20140704;
org_var1 = var1;
do i=0 to 5;
var1 =tranwrd(var1, catt('v', put(i, 1. -L)),"");
end;
run;
catt('v', put(i, 1. -L)) concatenates string 'v' and the result of put.
put(i, 1. -L)) converts numeric variable i to text using plain numeric format w.d, 1. used here - enough for single digit numbers, -L left aligns the result
Here's one way, there are many others and this may not work if your data has a lot of variability.
data have;
length VAR1$10;
VAR1 = 'fic19v0.csv';
output;
VAR1 = 'fic19v1.cs';
output;
run;
data want ;
set have;
original_var=var1;
var1=substr(var1, 1, index(var1, ".")-3)||".csv";
run;

How can I make a character variable equal to the formatted value of a numeric variable for arbitrary SAS formats?

If I have a numeric variable with a format, is there a way to get the formatted value as a character variable?
e.g. I would like to write something like the following to print 10/06/2009 to the screen but there is no putformatted() function.
data test;
format i ddmmyy10.;
i = "10JUN2009"d;
run;
data _null_;
set test;
i_formatted = putformatted(i); /* How should I write this? */
put i_formatted;
run;
(Obviously I can write put(i, ddmmyy10.), but my code needs to work for whatever format i happens to have.)
The VVALUE function formats the variable passed to it using the format associated with the variable. Here's the code using VVALUE:
data test;
format i ddmmyy10.;
i = "10JUN2009"d;
run;
data _null_;
set test;
i_formatted = vvalue(i);
put i_formatted;
run;
While cmjohns solution is slightly faster than this code, this code is simpler because there are no macros involved.
Use vformat() function.
/* test data */
data test;
i = "10jun2009"d;
format i ddmmyy10.;
run;
/* print out the value using the associated format */
data _null_;
set test;
i_formatted = putn(i, vformat(i));
put i_formatted=;
run;
/* on log
i_formatted=10/06/2099
*/
This seemed to work for a couple that I tried. I used VARFMT and a macro function to retrieve the format of the given variable.
data test;
format i ddmmyy10. b comma12.;
i = "10JUN2009"d;
b = 123405321;
run;
%macro varlabel(variable) ;
%let dsid=%sysfunc(open(&SYSLAST.)) ;
%let varnum=%sysfunc(varnum(&dsid,&variable)) ;
%let fmt=%sysfunc(varfmt(&dsid,&varnum));
%let dsid=%sysfunc(close(&dsid)) ;
&fmt
%mend varlabel;
data test2;
set test;
i_formatted = put(i, %varlabel(i) );
b_formatted = put(b, %varlabel(b) );
put i_formatted=;
put b_formatted=;
run;
This gave me:
i_formatted=10/06/2009
b_formatted=123,405,321
I can do this with macro code and sashelp.vcolumn but it's a bit fiddly.
proc sql noprint;
select trim(left(format)) into :format
from sashelp.vcolumn
where libname eq 'WORK' and memname eq 'TEST';
run;
data test2;
set test;
i_formatted = put(i, &format);
put i_formatted;
run;
Yes, there is a putformatted() function. In fact, there are two: putc() and putn(). Putc handles character formats, putn() numeric. Your code will need to look at the format name (all and only character formats start with "$") do determine which to use. Here is the syntax of putc (from the interactive help):
PUTC(source, format.<,w>)
Arguments
source
is the SAS expression to which you want to apply the format.
format.
is an expression that contains the character format you want to apply to source.
w
specifies a width to apply to the format.
Interaction: If you specify a width here, it overrides any width specification
in the format.