SAS Data Step / Pointer / Indirect Variable Assignment - sas

In a SAS Data Step i have a character variable called "varName". This variable stores the name of another variable. In below's example, it stores the name of the numeric variable "changeMe":
data TMP;
length
varName $32
changeMe 8
;
varName = ‘changeMe’;
/*??? How to change the content of variable that varName holds ???*/
run;
Now the question is: how do i change the content of the variable that varName holds?
The use case would be that varName acts as a dynamic pointer to different variables that i want to manipulate in a big SAS Data Set.

DATA Step does not directly provide for named indirect assignment.
In some cases, the indirect assignment requirement might indicate you want to perform a Proc TRANSPOSE data transformation. If the variable names and values are provided in a transaction data set, and the data has BY group variables, your better solution might be to TRANSPOSE the transaction data and merge that transform to the master data using an UPDATE or MODIFY statement.
Regardless, you can array variables of a given type and iterate the array looking for the target requiring assignment.
Example:
data want;
set sashelp.class;
varname = 'name';
varvalue = 'Scooter';
array chars _character_;
do _n_ = 1 to dim(chars);
if upcase (vname(chars(_n_))) = upcase(varname) then do;
chars(_n_) = varvalue;
end;
end;
run;
Output

call execute() is a highly feasible solution.
data TMP;
length
varName $32
changeMe 8
;
varName = 'changeMe';
run;
data _null_;
set TMP end=eof;
if _n_ = 1 then call execute('data %trim(&syslast.); modify %trim(&syslast.);');
call execute(cats(varName)||' = rand("uniform",1,0);');
if eof then call execute('run;');
run;
Log:
NOTE: CALL EXECUTE generated line.
1 + data WORK.TMP; modify WORK.TMP;
2 + changeMe = rand("uniform",1,0);
3 + run;
NOTE: There were 1 observations read from the data set WORK.TMP.
NOTE: The data set WORK.TMP has been updated. There were 1 observations rewritten, 0 observations
added and 0 observations deleted.

Related

Deleting missing values when dealing with panel data

I am working with a panel dataset, so many countries and many variables throughout a period. The problem is that some countries have no value for certain variables across the whole period and I would like to get rid of them. I found this code for deleting rows with missing values :
DATA data0;
SET data1;
IF cmiss(of _all_) then delete;
RUN;
But all this does is check every row, while I would like to delete a whole country if it has no observations in at least one variable.
Here's a part of the data :
If you want to delete the whole country if it has any information missing, you are on the right track, you just need to add a (group) by statement.
If your data is already sorted by country, as it appears to be in the picture, you can just run:
data want;
set have;
IF cmiss(of _all_) then delete;
by country;
If it is not sorted, you need to first run:
proc sort data=have;
by country;
However, if you have 60 years of data for every country, my guess is that you will not find a single one that have all the information for every year. It will be probably better to do some substantive choices of countries and periods you want to analyze, and then perform multiple imputatiom of missing data: https://support.sas.com/rnd/app/stat/papers/multipleimputation.pdf
You can use a DOW loop to compute which variable(s) contain only missing values within a group.
A second DOW loop outputs only those groups in which all variables contain at least on value.
Example:
data have;
call streaminit (2020);
do country = 1 to 6;
do year = 1960 to 1999;
array x gini kof tradegdp fdi gdp age_dep educ;
do over x;
x = rand('integer', 20, 100);
end;
if country = 1 then call missing (gini);
if country = 2 then call missing (educ);
if country = 4 then call missing (fdi);
output;
end;
end;
run;
data want;
* count number of non-missing values over group for each arrayed variable;
do _n_ = 1 by 1 until (last.country);
set have;
by country;
array x gini kof tradegdp fdi gdp age_dep educ;
array flag(100) _temporary_; * flag if variable has a non-missing value in group;
do _index = 1 to dim(x);
if not(flag(_index)) then flag(_index) = 1 - missing(x(_index));
end;
end;
* check if at least one variable has no values;
_remove_group_flag = sum(of flag(*)) ne dim(x);
do _n_ = 1 to _n_;
set have;
if not _remove_group_flag then output;
end;
call missing (of flag(*));
run;
Will LOG
NOTE: There were 240 observations read from the data set WORK.HAVE. First DOW loop
NOTE: There were 240 observations read from the data set WORK.HAVE. Second DOW loop
NOTE: The data set WORK.WANT has 120 observations and 11 variables. Conditional output

how to transpose data with multiple occurrences in sas

I have a 2 column dataset - accounts and attributes, where there are 6 types of attributes.
I am trying to use PROC TRANSPOSE in order to set the 6 different attributes as 6 new columns and set 1 where the column has that attribute and 0 where it doesn't
This answer shows two approaches:
Proc TRANSPOSE, and
array based transposition using index lookup via hash.
For the case that all of the accounts missing the same attribute, there would be no way for the data itself to exhibit all the attributes -- ideally the allowed or expected attributes should be listed in a separate table as part of your data reshaping.
Proc TRANSPOSE
When working with a table of only account and attribute you will need to construct a view adding a numeric variable that can be transposed. After TRANSPOSE the result data will have to be further massaged, replacing missing values (.) with 0.
Example:
data have;
call streaminit(123);
do account = 1 to 10;
do attribute = 'a','b','c','d','e','f';
if rand('uniform') < 0.75 then output;
end;
end;
run;
data stage / view=stage;
set have;
num = 1;
run;
proc transpose data=stage out=want;
by account;
id attribute;
var num;
run;
data want;
set want;
array attrs _numeric_;
do index = 1 to dim(attrs);
if missing(attrs(index)) then attrs(index) = 0;
end;
drop index;
run;
proc sql;
drop view stage;
From
To
Advanced technique - Array and Hash mapping
In some cases the Proc TRANSPOSE is deemed unusable by the coder or operator, perhaps very many by groups and very many attributes. An alternate way to transpose attribute values into like named flag variables is to code:
Two scans
Scan 1 determine attribute values that will be encountered and used as column names
Store list of values in a macro variable
Scan 2
Arrayify the attribute values as variable names
Map values to array index using hash (or custom informat per #Joe)
Process each group. Set arrayed variable corresponding to each encountered attribute value to 1.  Array index obtained via lookup through hash map.
Example:
* pass #1, determine attribute values present in data, the values will become column names;
proc sql noprint;
select distinct attribute into :attrs separated by ' ' from have;
* or make list of attributes from table of attributes (if such a table exists outside of 'have');
* select distinct attribute into :attrs separated by ' ' from attributes;
%put NOTE: &=attrs;
* pass #2, perform array based tranposformation;
data want2(drop=attribute);
* prep pdv, promulgate by group variable attributes;
if 0 then set have(keep=account);
array attrs &attrs.;
format &attrs. 4.;
if _n_=1 then do;
declare hash attrmap();
attrmap.defineKey('attribute');
attrmap.defineData('_n_');
attrmap.defineDone();
do _n_ = 1 to dim(attrs);
attrmap.add(key:vname(attrs(_n_)), data: _n_);
end;
end;
* preset all flags to zero;
do _n_ = 1 to dim(attrs);
attrs(_n_) = 0;
end;
* DOW loop over by group;
do until (last.account);
set have;
by account;
attrmap.find(); * lookup array index for attribute as column;
attrs(_n_) = 1; * set flag for attribute (as column);
end;
* implicit output one row per by group;
run;
One other option for doing this not using PROC TRANSPOSE is the data step array technique.
Here, I have a dataset that hopefully matches yours approximately. ID is probably your account, Product is your attribute.
data have;
call streaminit(2007);
do id = 1 to 4;
do prodnum = 1 to 6;
if rand('Uniform') > 0.5 then do;
product = byte(96+prodnum);
output;
end;
end;
end;
run;
Now, here we transpose it. We make an array with the six variables that could occur in HAVE. Then we iterate through the array to see if that variable is there. You can add a few additional lines to the if first.id block to set all of the variables to 0 instead of missing initially (I think missing is better, but YMMV).
data want;
set have;
by id;
array vars[6] a b c d e f;
retain a b c d e f;
if first.id then call missing(of vars[*]);
do _i = 1 to dim(vars);
if lowcase(vname(vars[_i])) = product then
vars[_i] = 1;
end;
if last.id then output;
run;
We could do it a lot faster if we knew how the dataset was constructed, of course.
data want;
set have;
by id;
array vars[6] a b c d e f;
if first.id then call missing(of vars[*]);
retain a b c d e f;
vars[rank(product)-96]=1;
if last.id then output;
run;
While your data doesn't really work that way, you could make an informat though that did this.
*First we build an informat relating the product to its number in the array order;
proc format;
invalue arrayi
'a'=1
'b'=2
'c'=3
'd'=4
'e'=5
'f'=6
;
quit;
*Now we can use that!;
data want;
set have;
by id;
array vars[6] a b c d e f;
if first.id then call missing(of vars[*]);
retain a b c d e f;
vars[input(product,arrayi.)]=1;
if last.id then output;
run;
This last one is probably the absolute fastest option - most likely much faster than PROC TRANSPOSE, which tends to be one of the slower procs in my book, but at the cost of having to know ahead of time what variables you're going to have in that array.

put values to a file using functions without creating new variables

I am processing a dataset, the contents of which I do not know in advance. My target SAS instance is 9.3, and I cannot use SQL as that has certain 'reserved' names (such as "user") that cannot be used as column names.
The puzzle looks like this:
data _null_;
set some.dataset; file somefile;
/* no problem can even apply formats */
put name age;
/* how to do this without making new vars? */
put somefunc(name) max(age);
run;
I can't put var1=somefunc(name); put var1; as that may clash with a source variable named var1.
I'm guessing the answer is to make some macro function that will read the dataset header and return me a "safe" (non-clashing) variable, or an fcmp function in a format, but I thought I'd check with the community to see - is there some "old school" way to outPUT directly from a function, in a data step?
Temporary array?
34 data _null_;
35 set sashelp.class;
36 array _n[*] _numeric_;
37 array _f[3] _temporary_;
38 put _n_ #;
39 do _n_ = 1 to dim(_f);
40 _f[_n_] = log(_n[_n_]);
41 put _f[_n_]= #;
42 end;
43 put ;
44 run;
1 _f[1]=2.6390573296 _f[2]=4.2341065046 _f[3]=4.7229532216
2 _f[1]=2.5649493575 _f[2]=4.0342406382 _f[3]=4.4308167988
3 _f[1]=2.5649493575 _f[2]=4.1789920363 _f[3]=4.5849674787
4 _f[1]=2.6390573296 _f[2]=4.1399550735 _f[3]=4.6298627986
5 _f[1]=2.6390573296 _f[2]=4.1510399059 _f[3]=4.6298627986
6 _f[1]=2.4849066498 _f[2]=4.0483006237 _f[3]=4.4188406078
7 _f[1]=2.4849066498 _f[2]=4.091005661 _f[3]=4.4367515344
8 _f[1]=2.7080502011 _f[2]=4.1351665567 _f[3]=4.7229532216
9 _f[1]=2.5649493575 _f[2]=4.1351665567 _f[3]=4.4308167988
The PUT statement does not accept a function invocation as a valid item for output.
A DATA step does not do columnar functions as you indicated with max(age) (so it would be even less likely to use such a function in PUT ;-)
Avoid name collisions
My recommendation is to use a variable name that is highly unlikely to collide.
_temp_001 = somefunc(<var>);
_temp_002 = somefunc2(<var2>);
put _temp_001 _temp_002;
drop _temp_:;
or
%let tempvar = _%sysfunc(rand(uniform, 1e15),z15.);
&tempvar = somefunc(<var>);
put &tempvar;
drop &tempvar;
%symdel tempvar;
Repurpose
You can re-purpose any automatic variable that is not important to the running step. Some omni-present candidates include:
numeric variables:
_n_
_iorc_
_threadid_
_nthreads_
first.<any-name> (only tweak after first. logic associated with BY statement)
last.<any-name>
character variables:
_infile_ (requires an empty datalines;)
_hostname_
avoid
_file_
_error_
I think you would be pretty safe choosing some unlikely to collide names. An easy way to generate these and still make the code somewhat readable would be to just hash a string to create a valid SAS varname and use a macro reference to make the code readable. Something like this:
%macro get_low_collision_varname(iSeed=);
%local try cnt result;
%let cnt = 0;
%let result = ;
%do %while ("&result" eq "");
%let try = %sysfunc(md5(&iSeed&cnt),hex32.);
%if %sysfunc(anyalpha(%substr(&try,1,1))) gt 0 %then %do;
%let result = &try;
%end;
%let cnt = %eval(&cnt + 1);
%end;
&result
%mend;
The above code takes a seed string and just adds a number to the end of it. It iterates the number until it gets a valid SAS varname as output from the md5() function. You could even then test the target dataset name to make sure the variable doesn't already exist. If it does build that logic into the above function.
Test it:
%let my_var = %get_low_collision_varname(iSeed=this shouldnt collide);
%put &my_var;
data _null_;
set sashelp.class;
&my_var = 1;
put _all_;
run;
Results:
Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 C34FD80ED9E856160E59FCEBF37F00D2=1 _ERROR_=0 _N_=1
Name=Alice Sex=F Age=13 Height=56.5 Weight=84 C34FD80ED9E856160E59FCEBF37F00D2=1 _ERROR_=0 _N_=2
This doesn't specifically answer the question of how to achieve it without creating new varnames, but it does give a practical workaround.

Looping over array, retaining values

I have a lot of data sets that I need to have the same structure - same variables, same order. I have a data set serving as a template ("all" in the code below). Other data are given this form by listing both the template data set (with obs=0) and a particular data set ("some" in the code below) in the same set statement. This works just fine.
I then want to loop through the variables. If one of them is missing (as it will be, if it's not present in the particular data set), it should be set to the value of the previous variable. var2 should get the value from var1 etc.
This should be done within each row. This works fine if done in a separate data step, but doesn't work if done in the same data step described above.
If done in the same data step, the values inserted for missing values will always be from row 1. Why is this? Can I achieve the wanted result without using another data step?
/* All the variables a complete data set should contain.*/
data all;
format var1-var5 $20.;
run;
/* Actual data have some of these variables, but not all. var1 is never missing, all other variables might be*/
data some;
var1="Obs 1, Value 1";
var4="Obs 1, Value 4";
output;
var1="Obs 2, Value 1";
var4="Obs 2, Value 4";
output;
run;
/* Not working - The values inserted when the conditional is true are all from row 1*/
data dont_want;
set all(obs=0) some;
array chars{*} _character_;
do i=1 to dim(chars);
if missing(chars{i}) then chars{i}=chars{i-1};
end;
drop i;
run;
/* Working*/
data temp;
set all(obs=0) some;
run;
data want;
set temp;
array chars{*} _character_;
do i=1 to dim(chars);
if missing(chars{i}) then chars{i}=chars{i-1};
end;
drop i;
run;
The values for the "extra" variables are being RETAINed since they are sourced from the ALL dataset. Any variable that is sourced from an input dataset is NOT reset to missing at the start of the data step iteration. Since those variables are not on the SOME dataset they do not change when an observation is read from it.
Just add code to reset them to missing. If you want to do it without knowing the names of the variables you might consider re-ordering the code.
You could define and clear the array after the compiler has "seen" the ALL dataset but before the run-time has read the SOME dataset.
data dont_want;
if 0 then set all;
array chars{*} _character_;
call missing(of chars{*});
set some;
do i=2 to dim(chars);
if missing(chars{i}) then chars{i}=chars{i-1};
end;
drop i;
run;
Or add an explicit OUTPUT statement and reset them after that.
data dont_want;
set all(obs=0) some;
array chars{*} _character_;
do i=2 to dim(chars);
if missing(chars{i}) then chars{i}=chars{i-1};
end;
drop i;
output;
call missing(of _all_);
run;
Couple things to do if you want to use implicit OUTPUT
Prep the PDV prior to the data reading set using a non-reading set
Set up array based on prepped PDV
Clear the array
Read the data with set
Impute your data
output
Example:
data dont_want;
if 0 then set all some; * non reading set preps the PDV;
array chars{*} _character_;
call missing(of chars(*)); * clears all auto-retained data set variables;
set all(obs=0) some; * data reading set;
* shift right an array requires left to right processing;
do i=dim(chars) to 2 by -1;
if missing(chars{i}) then chars{i}=chars{i-1};
end;
*** OR COPY right into empty slots, repeating prior copy if needed;
do i=2 to dim(chars);
if missing(chars{i}) then chars{i}=chars{i-1};
end;
drop i;
* implicit output;
run;

SAS: dim and macro variables

data example1;
input var1 var2 var3;
datalines;
10 11 14
3 5 8
0 1 2
;
data example2;
input var;
datalines;
1
2
8
;
Let's say that the number of var variables depending on data input. I want to put that number to macro variable and use in another data step, for example:
%macro m(input);
data &input.;
set &input.;
array var_array[*] var:;
%let array_dim = dim(var_array);
do i = 1 to &array_dim;
var_array[i] = var_array[i] + 1;
end;
drop i;
run;
data example2;
set example2;
var2 = var * &array_dim; /* doesn't work */
run;
%mend;
%m(example1);
%let array_dim = dim(var_array); doesn't work in second data step, because dim(var_array) isn't evaluated, but %eval or %sysevalf in declaring the macro variable does't work here. How to do that correctly?
You are mixing up macro code and data step code in a way that is not supported in SAS. If you want to assign a macro variable a value that you're generating as part of a data step, you need to use call symput.
Also, if you create a macro variable during a data step, you cannot resolve it during the same data step in the way that you are attempting to do (unless you use the resolve function...). It's easier just to use a data set variable for this.
So here's a fixed version of your code that I think probably does what you want:
%macro m(input);
data &input.;
set &input.;
array var_array[*] var:;
array_dim = dim(var_array);
/*Only export the macro variable once, for the first row*/
if _n_ = 1 then call symput('array_dim_mvar', array_dim);
do i = 1 to array_dim;
var_array[i] = var_array[i] + 1;
end;
drop i;
run;
data example2;
set example2;
var2 = var * &array_dim_mvar;
run;
%mend;
%m(example1);