SAS Array change to header ext - sas

I have a large data set with 73 columns of characteristics. If a member has a 1 in the box then they have the characteristic. I would like to change all of the 1s to the characteristic name. When uploading the data set I changed all columns to text and can use the following code to replace the 1s with "Yes" but trying to figure out how to change them to the column header text. ie "Single", "Married" etc..
DATA DataSetb;
SET DataSetA ;
array change _CHARACTER_ ;
do over change;
if change=1. then change=????????
End;
run;

This will only work if the variables are character in the first place and if they are you also need to make sure the length will hold the name.
Assuming these are accounted for you can use the VNAME() function to retrieve the name.
DO OVER is deprecated (20 years ago) so I don't use it. This will work, if the assumptions above are met.
You may also want to declare another array with the names if the length isn't large enough.
do i=1 to dim(change);
if change(i)='1' then change=vname(change(i));
End;

You can use the VNAME() function. Note you need to have defined the variables long enough to hold the names. SAS variable names can be up to 32 characters long.
Here is example code you can try that will show what happens if the variables are too short. Notice how the value of SEX is truncated to just 'S' since that variable is defined with length=$1.
data test;
set sashelp.class ;
array _c _character_;
do over _c ; _c=vname(_c); end;
run;
proc freq ;
tables _character_ / list ;
run;

Related

How do I sum similarly named variables in SAS?

I want to remove every observation where the variable starting with R1 has a missing value. In order to do this, I first try to sum every variable with that prefix:
data test
input R1_1 R1_2 R1_3;
datalines;
. . .
;
run;
data test2;
set test;
diagnosis=sum (of R1:);
run;
This syntax should work according to this article. However something seems to be wrong. In the above example, I get an error complaining about the function call not having enough arguments. In other cases, the code seems to run smoothly but my diagnosis variable isn't created.
Can I fix this and in that case how?
Your code does not work because you did not have a semicolon ending the DATA statement so the TEST dataset you created does not have any variables. Instead you also created datasets named INPUT R1_1 R1_2 and R1_3 that also did not have any variables.
To your actual question you can use NMISS() to count the number of missing numeric values.
nmiss = nmiss(of R1_:) ;
So you can eliminate observations with ANY missing values by using something like:
data want;
set have;
where nmiss(of R1_1-R1_3);
run;
If the goal is to remove observations where ALL of the values are missing you need to know how many variables you are testing. If you don't know that number in advance then you could use an ARRAY to count them. But then you would need to use a subsetting IF instead of WHERE.
data want;
set have;
array x r1_: ;
if nmiss(of r1_:) < dim(x);
run;
If you have a mix of numeric and character variables you can use CMISS() instead.

How to change the character length for all character variables in SAS?

I am using the following code to change the length of all character variables in my dataset. I am not sure why this loop is not working.
data test ;
set my.data;
array chars[*] _character_;
%do i = 1 %to dim(chars);
length chars[i] $ 30 ;
%end;
run;
You're mixing data step and macro commands, for one. %do is macro only, but the rest of that is data step only. You also need the length statement to be the first time the variable is encountered, not the set statement, as character lengths are not changeable after first encounter.
You either need to do this in the macro language, or do this with some other data-driven programming technique (as user667489 refers to some of). Here are two ways.
Macro based, using the open group of functions, which opens the dataset, counts how many variables there are, then iterates through those variables and calls the length statement for each (you could identically have one length, iterate through the variables, and one number). This is appropriate for a generic macro, but is probably more difficult to maintain.
%macro make_class_longer(varlength=);
data class;
%let did=%sysfunc(open(sashelp.class,i));
%let varcount=%sysfunc(attrn(&did,nvars));
%do _i = 1 %to &varcount;
%if %sysfunc(vartype(&did., &_i.))=C %then %do;
length %sysfunc(varname(&did.,&_i)) $&varlength.;
%end;
%end;
%let qid=%sysfunc(close(&did));
set sashelp.class;
run;
%mend make_class_longer;
%make_class_longer(varlength=30);
Similarly, here is a dictionary.columns solution. That queries the metadata directly and builds a list of character variables in a macro variable which is then used in a normal length statement. Easier to maintain, probably slower (but mostly meaninglessly so).
proc sql;
select name into :charlist separated by ' '
from dictionary.columns
where libname='SASHELP' and memname='CLASS' and type='char'
;
quit;
data class;
length &charlist. $30;
set sashelp.class;
run;
Length of variables is determined when a data step is compiled, so the first statement that mentions a variable usually determines its length. In your example, this is the set statement. Once fixed, a variable's length cannot be changed, unless you rebuild the whole dataset.
To get the result you want here, you would need to move your length statement above your set statement, and consequently you would also need to explicitly specify all the names of the variables whose lengths you want to set, as they would not otherwise exist yet at that point during compilation. You can do this either by hard-coding them or by generating code from sashelp.vcolumn / dictionary.columns.
There are a number of logical and syntactical errors in that code.
The main logical error is that you cannot change the length of a character variable after SAS has already determined what it should be. In your code it is determined when the SET statement is compiled.
Another logical error is using macro %DO loop inside a data step. Why?
Your example LENGTH statement is syntactically wrong. You cannot have an array reference in the LENGTH statement. Just the actual variable names. You could set the length in the ARRAY statement, if it was the first place the variables were defined. But you can't use the _character_ variable list then since for the variable list to find the variables the variables would have to already be defined. Which means it would be too late to change.
You will probably need to revert to a little code generation.
Let's make a sample dataset by using PROC IMPORT. We can use the SASHELP.CLASS example data for this.
filename csv temp;
proc export data=sashelp.class outfile=csv dbms=csv ;run;
proc import datafile=csv out=sample replace dbms=csv ;run;
Resulting variable list:
This is also a useful case as it will demonstrate one issue with changing the length of character variables. If you have assigned a FORMAT to the variable you could end up with the variable length not matching the format width.
Here is one way to dynamically generate code to change the length of the character variables without changing their relative position in the dataset. Basically this will read the metadata for the table and use it to generate a series of name/type+length pairs for each variable.
proc sql noprint ;
select varnum
, catx(' ',name,case when type='num' then put(length,1.) else '$30' end)
into :varlist
, :varlist separated by ' '
from dictionary.columns
where libname='WORK' and memname='SAMPLE'
order by 1
;
quit;
You can then use the generated list in a LENGTH statement to define the variables' type and length. You can also add in FORMAT and INFORMAT statements to remove the $xx. formats and informats that PROC IMPORT (mistakenly) adds to character variables.
data want ;
length &varlist ;
set sample;
format _character_ ;
informat _character_;
run;

Is there a way to change the length of a variable using Proc datasets?

I have a lot of datasets and variables that I need to modify the attributes of. Everything is working fine EXCEPT for the below instance where I I need to change the length of a variable:
data inputdset ;
format inputvar $20. ;
inputvar='ABCDEFGHIJKLMNOPQRST' ;
run ;
proc datasets lib=work nolist memtype=data ;
modify inputdset ;
attrib inputvar format=$50. length=50 ;
run ;
quit ;
Running this gives the following notes in the log:
NOTE: The LENGTH attribute cannot be changed and is therefore being ignored.
Blockquote
NOTE: MODIFY was successful for WORK.INPUTDSET.DATA.
...the final inputvar has a format of $50. as expected but still has a length of 20. Is there a way to have the length increased for these cases using proc datasets (or even better, if the length can be increased to match format)?
It's always risky to say no, but I'm going to try it. PROC DATASETS can modify the metadata about a dataset, not the data stored in each record. Changing the length of a variable requires changing the value stored in every record for that variable (truncating it or lengthening it and padding with blanks). Thus changing the length of a variable requires rewriting the entire dataset, which can be done by the DATA step or PROC SQL, but not PROC DATASETS.
Just a note that the documentation does specify that Length cannot be changed by the attrib statement under restrictions and the MODIFY statement.
https://support.sas.com/documentation/cdl/en/proc/68954/HTML/default/viewer.htm#n0ahh0eqtadmp3n1uwv55i2gyxiz.htm
MODIFY Statement
Changes the attributes of a SAS file and, through the use of subordinate statements, the attributes of variables in the SAS file.
Restriction: You cannot change the length of a variable using the LENGTH= option in an ATTRIB statement
To change the length of a column in a dataset you will have to rebuild the dataset with the new length. Typically you would do this using a datastep code pattern like the following(proc sql is another option).
data inputdset ;
format inputvar $20. ;
inputvar='ABCDEFGHIJKLMNOPQRST' ;
run ;
data inputdset;
length inputvar $ 50;
format inputvar $50.; * can change the format at the same time if you want;
set inputdset;
run;
The most common complaint with this pattern is that inputvar will now be the first column in the new dataset. You can correct this by properly listing all the variables in the length statement to preserve the original order.

Naming variable using _n_, a column for each iteration of a datastep

I need to declare a variable for each iteration of a datastep (for each n), but when I run the code, SAS will output only the last one variable declared, the greatest n.
It seems stupid declaring a variable for each row, but I need to achieve this result, I'm working on a dataset created by a proc freq, and I need a column for each group (each row of the dataset).
The result will be in a macro, so it has to be completely flexible.
proc freq data=&data noprint ;
table &group / out=frgroup;
run;
data group1;
set group (keep=&group count ) end=eof;
call symput('gr', _n_);
*REQUESTED code will go here;
run;
I tried these:
var&gr.=.;
call missing(var&gr.);
and a lot of other statement, but none worked.
Always the same result, the ds includes only var&gr where &gr is the maximum n.
It seems that the PDV is overwriting the new variable each iteration, but the name is different.
Please, include the result in a single datastep, or, at least, let the code take less time as possible.
Any idea on how can I achieve the requested result?
Thanks.
Macro variables don't work like you think they do. Any macro variable reference is resolved at compile time, so your call symput is changing the value of the macro variable after all the references have been resolved. The reason you are getting results where the &gr is the maximum n is because that is what &gr was as a result of the last time you ran the code.
If you know you can determine the maximum _n_, you can put the max value into a macro variable and declare an array like so:
Find max _n_ and assign value to maxn:
data _null_;
set have end=eof;
if eof then call symput('maxn',_n_);
run;
Create variables:
data want;
set have;
array var (&maxn);
run;
If you don't like proc transpose (if you need 3 columns you can always use it once for every column and then put together the outputs) what you ask can be done with arrays.
First thing you need to determine the number of groups (i.e. rows) in the input dataset and then define an array with dimension equal to that number.
Then the i-th element of your array can be recalled using _n_ as index.
In the following code &gr. contains the number of groups:
data group1;
set group;
array arr_counts(&gr.) var1-var&gr.;
arr_counts(_n_)= count;
run;
In SAS there're several methods to determine the number of obs in a dataset, my favorite is the following: (doesn't work with views)
data _null_;
if 0 then set group nobs=n;
call symputx('gr',n);
run;

Sas macro with proc sql

I want to perform some regression and i would like to count the number of nonmissing observation for each variable. But i don't know yet which variable i will use. I've come up with the following solution which does not work. Any help?
Here basically I put each one of my explanatory variable in variable. For example
var1 var 2 -> w1 = var1, w2= var2. Notice that i don't know how many variable i have in advance so i leave room for ten variables.
Then store the potential variable using symput.
data _null_;
cntw=countw(&parameters);
i = 1;
array w{10} $15.;
do while(i <= cntw);
w[i]= scan((&parameters"),i, ' ');
i = i +1;
end;
/* store a variable globally*/
do j=1 to 10;
call symput("explanVar"||left(put(j,3.)), w(j));
end;
run;
My next step is to perform a proc sql using the variable i've stored. It does not work as
if I have less than 10 variables.
proc sql;
select count(&explanVar1), count(&explanVar2),
count(&explanVar3), count(&explanVar4),
count(&explanVar5), count(&explanVar6),
count(&explanVar7), count(&explanVar8),
count(&explanVar9), count(&explanVar10)
from estimation
;quit;
Can this code work with less than 10 variables?
You haven't provided the full context for this project, so it's unclear if this will work for you - but I think this is what I'd do.
First off, you're in SAS, use SAS where it's best - counting things. Instead of the PROC SQL and the data step, use PROC MEANS:
proc means data=estimation n;
var &parameters.;
run;
That, without any extra work, gets you the number of nonmissing values for all of your variables in one nice table.
Secondly, if there is a reason to do the PROC SQL, it's probably a bit more logical to structure it this way.
proc sql;
select
%do i = 1 %to %sysfunc(countw(&parameters.));
count(%scan(&parameters.,&i.) ) as Parameter_&i., /* or could reuse the %scan result to name this better*/
%end; count(1) as Total_Obs
from estimation;
quit;
The final Total Obs column is useful to simplify the code (dealing with the extra comma is mildly annoying). You could also put it at the start and prepend the commas.
You finally could also drive this from a dataset rather than a macro variable. I like that better, in general, as it's easier to deal with in a lot of ways. If your parameter list is in a data set somewhere (one parameter per row, in the dataset "Parameters", with "var" as the name of the column containing the parameter), you could do
proc sql;
select cats('%countme(var=',var,')') into :countlist separated by ','
from parameters;
quit;
%macro countme(var=);
count(&var.) as &var._count
%mend countme;
proc sql;
select &countlist from estimation;
quit;
This I like the best, as it is the simplest code and is very easy to modify. You could even drive it from a contents of estimation, if it's easy to determine what your potential parameters might be from that (or from dictionary.columns).
I'm not sure about your SAS macro, but the SQL query will work with these two notes:
1) If you don't follow your COUNT() functions with an identifier such as "COUNT() AS VAR1", your results will not have field headings. If that's ok with you, then you may not need to worry about it. But if you export the data, it will be helpful for you if you name them by adding "...AS "MY_NAME".
2) For observations with fewer than 10 variables, the query will return NULL values. So don't worry about not getting all of the results with what you have, because as long as the table you're querying has space for 10 variables (10 separate fields), you will get data back.