Lets suppose we have the following dataset:
ID Stress_Level Heart_Rate
1 5 10
2 7 12
3 9 16
And the code one would use to rename a variable would be:
data test1;
set test0;
rename Stress_Level=A Heart_Rate=B;
run;
However, what I would like to do is to rename the 2 columns without using their names. Is there an "internal" SAS command that addresses the variable depending on which column it is? So for instance Stress_Level which is the 2nd column could be addressed as "COL2 " or something similar. Thus the code would be:
data test1;
set test0;
rename COL2=A COL3=B;
run;
Where "COL2" would always refer to the second column in the dataset regardless of its name. Is there a direct or maybe an indirect way to achieve that?
I think the easiest way is to build up a rename statement string from the metadata table DICTIONARY.COLUMNS (the view of this is SASHELP.VCOLUMN). This holds the column names and position for all tables in active libraries.
I've taken advantage of the ASCII sequence (the byte function) to rename the columns A, B etc, obviously you'd run into problems if there are more than 26 columns to be renamed in the table!
You'll also need to tweak the varnum+63 calculation if you wanted to start from a different column than 2.
proc sql noprint;
select cats(name,"=",byte(varnum+63)) into :newvars separated by ' '
from dictionary.columns
where libname = 'WORK' and memname='HAVE' and varnum>=2;
quit;
data want;
set have;
rename &newvars.;
run;
/* or */
/*
proc datasets lib=work nolist nodetails;
modify have;
rename &newvars.;
quit;
*/
There are a couple of ways you can do this.
The shortest approach is probably to use an array. The only drawbacks are that you need to know the types of the variables in advance and the name of the first variable.
If they are all numeric as in your example the following could be used:
data test1;
set test0;
array vars[*] _numeric_;
A = vars[2];
B = vars[3];
keep ID A B;
run;
You can only have one type of variable in an array, so it's slightly more complicated if they are not all numeric or all character. Additionally you will need to know the name of the first variable and any other variables that you wish to keep if you don't want to have the duplicates of the second and third variables.
A more robust approach is to use information from a dictionary table and a macro variable to write your rename statement:
proc sql;
/* Write the individual rename assignments */
select strip(name) || " = " || substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ", varnum - 1, 1)
/* Store them in a macro variable and separate them by spaces */
into :vars separated by " "
/* Use a sas dictionary table to find metadata about the dataset */
from sashelp.vcolumn
where
libname = "WORK" and
memname = "TEST0" and
2 <= varnum <= 3;
quit;
data test1;
set test0;
rename &vars.;
run;
SAS stores information about datasets in dictionary tables, which have views available in the sashelp library. Take a look in some of the sashelp.v* tables to see what kind of information is available. The proc sql colon is used to store values in a macro variable, which can then be used in the rename statement.
I'd recommend the second approach as it is considerably more flexible and less dependent on the exact structure of your data. It also expands better when you have more than a couple of variables to rename.
Finally, if you want to make the changes to a dataset in place you may want to take a look at using proc datasets (in combination with the dictionary table approach) to do the renaming, as this can change the variable names without having to read and write every line of data.
Related
I am new to SAS and need to sgplot 112 variables. The variable names are all very different and may change over time. How can I call each variable in the statement without having to list all of them?
Here is what I have done so far:
%macro graph(var);
proc sgplot data=monthly;
series x=date y=var;
title 'var';
run;
%mend;
%graph(gdp);
%graph(lbr);
The above code can be a pain since I have to list 112 %graph() lines and then change the names in the future as the variable names change.
Thanks for the help in advance.
List processing is the concept you need to deal with something like this. You can also use BY group processing or in the case of graphing Paneling in some cases to approach this issue.
Create a dataset from a source convenient to you that contains the list of variables. This could be an excel or text file, or it could be created from your data if there's a way to programmatically tell which variables you need.
Then you can use any of a number of methods to produce this:
proc sql;
select cats('%graph(',var,')')
into: graphlist separated by ' '
from yourdata;
quit;
&graphlist
For example.
In your case, you could also generate a vertical dataset with one row per variable, which might be easier to determine which variables are correct:
data citiwk;
set sashelp.citiwk;
var='COM';
val=WSPCA;
output;
var='UTI';
val=WSPUA;
output;
var='INDU';
val=WSPIA;
output;
val=WSPGLT;
var='GOV';
output;
keep val var date;
run;
proc sort data=citiwk;
by var date;
run;
proc sgplot data=citiwk;
by var;
series x=date y=val;
run;
While I hardcoded those four, you could easily create an array and use VNAME() to get the variable name or VLABEL() to get the variable label of each array element.
I have multiple datasets. Each of them has different number of attributes. I want to merge them all by common variable. This is 'union' if I use proc SQL. But there is hunderds of variables.
Example.
Dataset_Name Number of columns
dataset1 110
dataset2 120
dataset3 130
... ...
Say they have 100 columns in common. The final dataset which contains all dataset1,dataset2,dataset3..etc
only has common columns(in this case, 100 columns).
How do I do this?
And how do I get columns for each dataset this is not in common with the final dataset.
example: dataset1 will have 10 columns that are not in the final dataset, and list the name of 10 columns.
Thanks!!!!
UNION in SQL is equivalent to sequential SET in SAS.
data want;
set dataset1 dataset2 dataset3;
run;
Now, SAS by default includes all columns present in any dataset. To limit to just what's in all datasets, you have to use a keep statement.
You can determine this using proc sql, among other ways.
proc sql;
select name into :commonlist separated by ' '
from dictionary.columns C, dictionary.columns D
where C.libname=D.libname
and C.memname='DATASET1'
and D.memname='DATASET2'
and C.name=D.name
;
quit;
For more than two datasets it's more complicated and partially depends on your, but if you're comfortable in SQL you can figure that out pretty easily. A similar construct can create a list of just dataset 1 variables. The important part is the into :commonlist separated by ' ', which says to pull the select results into a macro variable called commonlist, separating rows by space. (The colon says to create a macro variable, not a table.)
So you can then run:
data want (keep=&commonlist.) dset1(keep=&dset1list.) dset2(keep=&dset2list.);
set dataset1(in=ds1) dataset2(in=ds2) dataset3(in=ds3);
output want;
if ds1 then output dset1;
else if ds2 then output dset2;
else if ds3 then output dset3;
run;
The in=xyz indicates which dataset a row came from. Each output dataset can have a separate list of variables to keep. You might want to keep the ID variable in those other datasets as well.
I will say that usually in SAS you don't do what you're doing here: it's not easy to do because it doesn't tend to be the best way to handle things - specifically, the little split off datasets. In general you would just keep those extra variables on the master dataset, and they'd just be nulls for anyone not in a dataset with that variable - assuming it makes sense to make this 'master' dataset at all.
I want to perform some regression and i would like to count the number of nonmissing observation for each variable. But i don't know yet which variable i will use. I've come up with the following solution which does not work. Any help?
Here basically I put each one of my explanatory variable in variable. For example
var1 var 2 -> w1 = var1, w2= var2. Notice that i don't know how many variable i have in advance so i leave room for ten variables.
Then store the potential variable using symput.
data _null_;
cntw=countw(¶meters);
i = 1;
array w{10} $15.;
do while(i <= cntw);
w[i]= scan((¶meters"),i, ' ');
i = i +1;
end;
/* store a variable globally*/
do j=1 to 10;
call symput("explanVar"||left(put(j,3.)), w(j));
end;
run;
My next step is to perform a proc sql using the variable i've stored. It does not work as
if I have less than 10 variables.
proc sql;
select count(&explanVar1), count(&explanVar2),
count(&explanVar3), count(&explanVar4),
count(&explanVar5), count(&explanVar6),
count(&explanVar7), count(&explanVar8),
count(&explanVar9), count(&explanVar10)
from estimation
;quit;
Can this code work with less than 10 variables?
You haven't provided the full context for this project, so it's unclear if this will work for you - but I think this is what I'd do.
First off, you're in SAS, use SAS where it's best - counting things. Instead of the PROC SQL and the data step, use PROC MEANS:
proc means data=estimation n;
var ¶meters.;
run;
That, without any extra work, gets you the number of nonmissing values for all of your variables in one nice table.
Secondly, if there is a reason to do the PROC SQL, it's probably a bit more logical to structure it this way.
proc sql;
select
%do i = 1 %to %sysfunc(countw(¶meters.));
count(%scan(¶meters.,&i.) ) as Parameter_&i., /* or could reuse the %scan result to name this better*/
%end; count(1) as Total_Obs
from estimation;
quit;
The final Total Obs column is useful to simplify the code (dealing with the extra comma is mildly annoying). You could also put it at the start and prepend the commas.
You finally could also drive this from a dataset rather than a macro variable. I like that better, in general, as it's easier to deal with in a lot of ways. If your parameter list is in a data set somewhere (one parameter per row, in the dataset "Parameters", with "var" as the name of the column containing the parameter), you could do
proc sql;
select cats('%countme(var=',var,')') into :countlist separated by ','
from parameters;
quit;
%macro countme(var=);
count(&var.) as &var._count
%mend countme;
proc sql;
select &countlist from estimation;
quit;
This I like the best, as it is the simplest code and is very easy to modify. You could even drive it from a contents of estimation, if it's easy to determine what your potential parameters might be from that (or from dictionary.columns).
I'm not sure about your SAS macro, but the SQL query will work with these two notes:
1) If you don't follow your COUNT() functions with an identifier such as "COUNT() AS VAR1", your results will not have field headings. If that's ok with you, then you may not need to worry about it. But if you export the data, it will be helpful for you if you name them by adding "...AS "MY_NAME".
2) For observations with fewer than 10 variables, the query will return NULL values. So don't worry about not getting all of the results with what you have, because as long as the table you're querying has space for 10 variables (10 separate fields), you will get data back.
I'm just starting to learn SAS and wanted to see if anyone knew of a way to delete certain variables from a dataset if they contained a certain word. I'm working with a dataset that contains a huge amount of variables (100+) with the word 'Label' in them and am looking to drop these. Unfortunately the word label comes at the end of the variable name, so I can't do a simple drop label:; Obviously I could individually list all the variables to drop, but I just wanted to see if anyone out there knew of a simpler way to accomplish this task. Thanks for reading and for any help you have to offer up.
Using a the vcolumn table and proc sql to create a macro variable a macro variable:
proc sql noprint;
select trim(compress(name))
into :drop_vars separated by ' '
from sashelp.vcolumn
where libname = upcase('lib1')
and
memname = upcase('table1')
and
upcase(name) like '%LABEL%'
;
quit;
%put &drop_vars.;
data table2;
set table1;
drop &drop_vars.;
run;
the proc sql will create a list of all the variables from table1 in library 'lib1' containing label anywhere in the name and put it into the macro variable called drop_vars. (upcase is used to reduce possibility of case causing an issue)
The data step then uses the drop statement and the drop_vars variable to drop all variables in the list.
Note: Make sure you check the output of the %put statement to ensure you do not drop variables you want to keep
What you need to do is come up with a dataset that contains the variable names, then create a macro variable containing those you want to drop. There are three (or more) options for the first part:
dictionary.columns
sashelp.vcolumn
proc contents output to a dataset
All three give the same result - a dataset of variable names (and other things), which you can then query.
So for example, using PROC SQL's SELECT INTO functionality to create a macro variable:
proc sql;
select name into :droplist separated by ' '
from dictionary.columns
where libname='SASHELP' and memname='CLASS'
and name like '%eigh%';
quit;
(replace eigh with Label for your needs; % is wildcard here)
and then you have a macro variable &droplist, which you can then use in a drop statement.
data want;
set sashelp.class;
drop &droplist;
run;
I'm dealing with one data problem in sas.
I have one dateset including 1000 variables and 1000 records for each variable.
And I have another variable list which includes 100 variable names.
I'd like to subset the first dataset when the variable names in that dataset match the variable list.
I tried proc merge and proc sql, but cannot work it out.
Could any one help me out?
Thanks a lot
SAS keeps or drops variables with the conveniently named keywords 'keep' and 'drop'. PROC SQL can help you generate a list if you don't already have it in text format.
data want;
set have;
keep var1 var2 var3 var4;
run;
If you have the list of variables in dataset "vnames" with the variable "tokeep", you can do this:
proc sql;
select tokeep into :keeplist separated by ' ' from vnames;
quit;
data want;
set have;
keep &keeplist.;
run;
PROC SQL is taking the contents of 'tokeep' and instead of selecting them to a table or the screen, putting them in a space-delimited list inside a macro variable 'keeplist', which then is used as the arguments for the 'keep' statement.
Here you can find how to output a list of all the variable names of a dataset as another dataset. This will make it way easier to decide which of the big datasets you will use and which you will not (e.g. a left (or right) join of variable names, then look at the number of rows is at least the count of variables which you want to have).