How do I sum similarly named variables in SAS? - sas

I want to remove every observation where the variable starting with R1 has a missing value. In order to do this, I first try to sum every variable with that prefix:
data test
input R1_1 R1_2 R1_3;
datalines;
. . .
;
run;
data test2;
set test;
diagnosis=sum (of R1:);
run;
This syntax should work according to this article. However something seems to be wrong. In the above example, I get an error complaining about the function call not having enough arguments. In other cases, the code seems to run smoothly but my diagnosis variable isn't created.
Can I fix this and in that case how?

Your code does not work because you did not have a semicolon ending the DATA statement so the TEST dataset you created does not have any variables. Instead you also created datasets named INPUT R1_1 R1_2 and R1_3 that also did not have any variables.
To your actual question you can use NMISS() to count the number of missing numeric values.
nmiss = nmiss(of R1_:) ;
So you can eliminate observations with ANY missing values by using something like:
data want;
set have;
where nmiss(of R1_1-R1_3);
run;
If the goal is to remove observations where ALL of the values are missing you need to know how many variables you are testing. If you don't know that number in advance then you could use an ARRAY to count them. But then you would need to use a subsetting IF instead of WHERE.
data want;
set have;
array x r1_: ;
if nmiss(of r1_:) < dim(x);
run;
If you have a mix of numeric and character variables you can use CMISS() instead.

Related

SAS: How to create a variable like 1,2,3,4,...,N

I need to introduce a time trend to a regression model for a course but have no idea how to create a variable that's just (1,2,3,4,...,108). In R or Python I would just create an empty vector of 0's and then loop through to fill them with the loop index but I have no clue how to do it in SAS.
Thank you in advance
data want;
set have;
time_trend+1;
run;
SAS is an inherently looping language. The code above does four things:
Read a row
Add 1 to a variable called time_trend
Output the row to a dataset named want
Read the next row and execute the statements again
SAS automatically initialized the variable time_trend for us at compilation, so we do not need to declare a length or type. SAS assumes it is a numeric variable by default.
The statement time_trend+1 is a special shortcut of the below logic:
data want;
set have;
retain time_trend 0;
time_trend = time_trend + 1;
run;

How to change the character length for all character variables in SAS?

I am using the following code to change the length of all character variables in my dataset. I am not sure why this loop is not working.
data test ;
set my.data;
array chars[*] _character_;
%do i = 1 %to dim(chars);
length chars[i] $ 30 ;
%end;
run;
You're mixing data step and macro commands, for one. %do is macro only, but the rest of that is data step only. You also need the length statement to be the first time the variable is encountered, not the set statement, as character lengths are not changeable after first encounter.
You either need to do this in the macro language, or do this with some other data-driven programming technique (as user667489 refers to some of). Here are two ways.
Macro based, using the open group of functions, which opens the dataset, counts how many variables there are, then iterates through those variables and calls the length statement for each (you could identically have one length, iterate through the variables, and one number). This is appropriate for a generic macro, but is probably more difficult to maintain.
%macro make_class_longer(varlength=);
data class;
%let did=%sysfunc(open(sashelp.class,i));
%let varcount=%sysfunc(attrn(&did,nvars));
%do _i = 1 %to &varcount;
%if %sysfunc(vartype(&did., &_i.))=C %then %do;
length %sysfunc(varname(&did.,&_i)) $&varlength.;
%end;
%end;
%let qid=%sysfunc(close(&did));
set sashelp.class;
run;
%mend make_class_longer;
%make_class_longer(varlength=30);
Similarly, here is a dictionary.columns solution. That queries the metadata directly and builds a list of character variables in a macro variable which is then used in a normal length statement. Easier to maintain, probably slower (but mostly meaninglessly so).
proc sql;
select name into :charlist separated by ' '
from dictionary.columns
where libname='SASHELP' and memname='CLASS' and type='char'
;
quit;
data class;
length &charlist. $30;
set sashelp.class;
run;
Length of variables is determined when a data step is compiled, so the first statement that mentions a variable usually determines its length. In your example, this is the set statement. Once fixed, a variable's length cannot be changed, unless you rebuild the whole dataset.
To get the result you want here, you would need to move your length statement above your set statement, and consequently you would also need to explicitly specify all the names of the variables whose lengths you want to set, as they would not otherwise exist yet at that point during compilation. You can do this either by hard-coding them or by generating code from sashelp.vcolumn / dictionary.columns.
There are a number of logical and syntactical errors in that code.
The main logical error is that you cannot change the length of a character variable after SAS has already determined what it should be. In your code it is determined when the SET statement is compiled.
Another logical error is using macro %DO loop inside a data step. Why?
Your example LENGTH statement is syntactically wrong. You cannot have an array reference in the LENGTH statement. Just the actual variable names. You could set the length in the ARRAY statement, if it was the first place the variables were defined. But you can't use the _character_ variable list then since for the variable list to find the variables the variables would have to already be defined. Which means it would be too late to change.
You will probably need to revert to a little code generation.
Let's make a sample dataset by using PROC IMPORT. We can use the SASHELP.CLASS example data for this.
filename csv temp;
proc export data=sashelp.class outfile=csv dbms=csv ;run;
proc import datafile=csv out=sample replace dbms=csv ;run;
Resulting variable list:
This is also a useful case as it will demonstrate one issue with changing the length of character variables. If you have assigned a FORMAT to the variable you could end up with the variable length not matching the format width.
Here is one way to dynamically generate code to change the length of the character variables without changing their relative position in the dataset. Basically this will read the metadata for the table and use it to generate a series of name/type+length pairs for each variable.
proc sql noprint ;
select varnum
, catx(' ',name,case when type='num' then put(length,1.) else '$30' end)
into :varlist
, :varlist separated by ' '
from dictionary.columns
where libname='WORK' and memname='SAMPLE'
order by 1
;
quit;
You can then use the generated list in a LENGTH statement to define the variables' type and length. You can also add in FORMAT and INFORMAT statements to remove the $xx. formats and informats that PROC IMPORT (mistakenly) adds to character variables.
data want ;
length &varlist ;
set sample;
format _character_ ;
informat _character_;
run;

SAS Array change to header ext

I have a large data set with 73 columns of characteristics. If a member has a 1 in the box then they have the characteristic. I would like to change all of the 1s to the characteristic name. When uploading the data set I changed all columns to text and can use the following code to replace the 1s with "Yes" but trying to figure out how to change them to the column header text. ie "Single", "Married" etc..
DATA DataSetb;
SET DataSetA ;
array change _CHARACTER_ ;
do over change;
if change=1. then change=????????
End;
run;
This will only work if the variables are character in the first place and if they are you also need to make sure the length will hold the name.
Assuming these are accounted for you can use the VNAME() function to retrieve the name.
DO OVER is deprecated (20 years ago) so I don't use it. This will work, if the assumptions above are met.
You may also want to declare another array with the names if the length isn't large enough.
do i=1 to dim(change);
if change(i)='1' then change=vname(change(i));
End;
You can use the VNAME() function. Note you need to have defined the variables long enough to hold the names. SAS variable names can be up to 32 characters long.
Here is example code you can try that will show what happens if the variables are too short. Notice how the value of SEX is truncated to just 'S' since that variable is defined with length=$1.
data test;
set sashelp.class ;
array _c _character_;
do over _c ; _c=vname(_c); end;
run;
proc freq ;
tables _character_ / list ;
run;

Naming variable using _n_, a column for each iteration of a datastep

I need to declare a variable for each iteration of a datastep (for each n), but when I run the code, SAS will output only the last one variable declared, the greatest n.
It seems stupid declaring a variable for each row, but I need to achieve this result, I'm working on a dataset created by a proc freq, and I need a column for each group (each row of the dataset).
The result will be in a macro, so it has to be completely flexible.
proc freq data=&data noprint ;
table &group / out=frgroup;
run;
data group1;
set group (keep=&group count ) end=eof;
call symput('gr', _n_);
*REQUESTED code will go here;
run;
I tried these:
var&gr.=.;
call missing(var&gr.);
and a lot of other statement, but none worked.
Always the same result, the ds includes only var&gr where &gr is the maximum n.
It seems that the PDV is overwriting the new variable each iteration, but the name is different.
Please, include the result in a single datastep, or, at least, let the code take less time as possible.
Any idea on how can I achieve the requested result?
Thanks.
Macro variables don't work like you think they do. Any macro variable reference is resolved at compile time, so your call symput is changing the value of the macro variable after all the references have been resolved. The reason you are getting results where the &gr is the maximum n is because that is what &gr was as a result of the last time you ran the code.
If you know you can determine the maximum _n_, you can put the max value into a macro variable and declare an array like so:
Find max _n_ and assign value to maxn:
data _null_;
set have end=eof;
if eof then call symput('maxn',_n_);
run;
Create variables:
data want;
set have;
array var (&maxn);
run;
If you don't like proc transpose (if you need 3 columns you can always use it once for every column and then put together the outputs) what you ask can be done with arrays.
First thing you need to determine the number of groups (i.e. rows) in the input dataset and then define an array with dimension equal to that number.
Then the i-th element of your array can be recalled using _n_ as index.
In the following code &gr. contains the number of groups:
data group1;
set group;
array arr_counts(&gr.) var1-var&gr.;
arr_counts(_n_)= count;
run;
In SAS there're several methods to determine the number of obs in a dataset, my favorite is the following: (doesn't work with views)
data _null_;
if 0 then set group nobs=n;
call symputx('gr',n);
run;

How to recode variables in SAS?

I tried to recode the missing values but instead lost all my other variables within a dataset
BEFORE:
AFTER:
data work.newdataset;
if (year =.) then year = 2000;
run;
You are missing the SET statement.
data want;
set have;
myvar=5;
run;
will create a new dataset, want, from have, with the new variable value applied (or the recode or whatever). You could also do
data have;
set have;
myvar=5;
run;
That would replace have with itself plus the recode/whatever. This is actually less common in SAS; it is often preferable to do all recodes in one step, but to create a new dataset (so that the code is reversible easily).