SAS sum variables using name after a proc transpose - sas

I have a table with postings by category (a number) that I transposed. I got a table with each column name as _number for example _16, _881, _853 etc. (they aren't in order).
I need to do the sum of all of them in a proc sql, but I don't want to create the variable in a data step, and I don't want to write all of the columns names either . I tried this but doesn't work:
proc sql;
select sum(_815-_16) as nnl
from craw.xxxx;
quit;
I tried going to the first number to the last and also from the number corresponding to the first place to the one corresponding to the last place. Gives me a number that it's not correct.
Any ideas?
Thanks!

You can't use variable lists in SQL, so _: and var1-var6 and var1--var8 don't work.
The easiest way to do this is a data step view.
proc sort data=sashelp.class out=class;
by sex;
run;
*Make transposed dataset with similar looking names;
proc transpose data=class out=transposed;
by sex;
id height;
var height;
run;
*Make view;
data transpose_forsql/view=transpose_forsql;
set transposed;
sumvar = sum(of _:); *I confirmed this does not include _N_ for some reason - not sure why!;
run;
proc sql;
select sum(sumvar) from transpose_Forsql;
quit;

I have no documentation to support this but from my experience, I believe SAS will assume that any sum() statement in SQL is the sql-aggregate statement, unless it has reason to believe otherwise.
The only way I can see for SAS to differentiate between the two is by the way arguments are passed into it. In the below example you can see that the internal sum() function has 3 arguments being passed in so SAS will treat this as the SAS sum() function (as the sql-aggregate statement only allows for a single argument). The result of the SAS function is then passed in as the single parameter to the sql-aggregate sum function:
proc sql noprint;
create table test as
select sex,
sum(sum(height,weight,0)) as sum_height_and_weight
from sashelp.class
group by 1
;
quit;
Result:
proc print data=test;
run;
sum_height_
Obs Sex and_weight
1 F 1356.3
2 M 1728.6
Also note a trick I've used in the code by passing in 0 to the SAS function - this is an easy way to add an additional parameter without changing the intended result. Depending on your data, you may want to swap out the 0 for a null value (ie. .).
EDIT: To address the issue of unknown column names, you can create a macro variable that contains the list of column names you want to sum together:
proc sql noprint;
select name into :varlist separated by ','
from sashelp.vcolumn
where libname='SASHELP'
and memname='CLASS'
and upcase(name) like '%T' /* MATCHES HEIGHT AND WEIGHT */
;
quit;
%put &varlist;
Result:
Height,Weight
Note that you would need to change the above wildcard to match your scenario - ie. matching fields that begin with an underscore, instead of fields that end with the letter T. So your final SQL statement will look something like this:
proc sql noprint;
create table test as
select sex,
sum(sum(&varlist,0)) as sum_of_fields_ending_with_t
from sashelp.class
group by 1
;
quit;
This provides an alternate approach to Joe's answer - though I believe using the view as he suggests is a cleaner way to go.

Related

SAS targed encored

Hi,
Can someone explain to me what a given code sequence does step by step?**
I must describe it in detail what is happening in turn
%macro frequency_encoding(dataset, var);
proc sql noprint;
create table freq as
select distinct(&var) as values, count(&var) as number
from &dataset
group by Values ;
create table new as select *, round(freq.number/count(&var),00.01) As freq_encode
from &dataset left join freq on &var=freq.values;
quit;
data new(drop=values number &var);
set new;
rename freq_encode=&var;
run;
data new;
set new;
keep &var;
run;
data dane(drop = &var);
set dane;
run;
data dane;
set dane;
set new;
run;
The SQL is first finding the frequency of each value of the variable. Then it divides those counts by the total number of non-missing values and rounds that percentage to two decimal places (or integers when you think of the ratio as a percentage).
This could be done in one step with:
proc sql noprint;
create table new as
select *,round(number/count(&var),0.01) as freq_encode
from (select *,&var as values,count(&var) as number
from &dataset
group by &var
)
;
quit;
It is not clear what the DANE dataset is supposed to be. If &DATESET does not equal DANE then those last four data steps make no sense. If it does then it is a convoluted way to replace the original variable with the percentage.
The first one is basically trying to rename the calculated percentage as the original variable and eliminate the original variable and the other two intermediate variables used in calculating the percentage.
The second one is dropping all of the variables except the new percentage.
The third one is dropping the original variable from "dane".
The last one is adding the new variable back to "dane".
Assuming DANE should be replaced with &DATASET then those four data steps could be reduced to one:
data &dataset;
set &dataset(drop=&var);
set new(keep=freq_encode rename=(freq_encode=&var));
run;
It is probably best not to overwrite your original dataset in that way. So perhaps you should add an OUT parameter to your macro to name to new dataset you want to create.
You could have avoided all of those data steps by just adding the DROP= and RENAME= dataset options to the dataset generated by the SQL query.
So perhaps you want something like this:
%macro frequency_encoding(dataset, var,out);
proc sql noprint;
create table &out(drop=&var number rename=(freq_encode=&var)) as
select *,round(number/count(&var),0.01) as freq_encode
from (select *,count(&var) as number
from &dataset
group by &var
)
;
quit;
%mend ;
%frequency_encoding(sashelp.class,sex,work.class);

SAS: How to calculate frequency for all character variables except some

I know I can have something like the following to calculate frequency for all Chars:
proc freq data=sashelp.class;
tables _char_;
run;
However, is there a way to exclude some variables? I want to do something like:
proc freq data=sashelp.class;
tables _char_ EXCEPT VAR1 VAR2;
run;
Thank you so much!
you can use drop = , as shown below.
proc freq data=sashelp.cars(drop=origin make);
tables _char_;
run;
The drop example is the simplest, certainly, and probably best if that's exactly the request.
However, if it's slightly different - such as, you want to include (or exclude) all character variables matching a particular pattern, you can use macro variables constructed from dictionary.columns (or proc contents output dataset).
proc sql;
select name
into :freqlist separated by ' '
from dictionary.columns
where memname='YOURTABLE' and libname='YOURLIB'
and type='char' and name like 'PATTERN%'
;
quit;
Obviously filling in the various uppercase things as appropriate. Usually MEMNAME, LIBNAME, and NAME are stored upper case, though not always, so consider adding UPCASE() to them.
Then you can put &freqlist.; on the tables statement to get the list of columns that match your query.

SAS equivalent to R’s is.element()

It’s the first time that I’ve opened sas today and I’m looking at some code a colleague wrote.
So let’s say I have some data (import) where duplicates occur but I want only those which have a unique number named VTNR.
First she looks for unique numbers:
data M.import;
set M.import;
by VTNR;
if first.VTNR=1 then unique=1;
run;
Then she creates a table with the duplicated numbers:
data M.import_dup1;
set M.import;
where unique^=1;
run;
And finally a table with all duplicates.
But here she is really hardcoding the numbers, so for example:
data M.import_dup2;
set M.import;
where VTNR in (130001292951,130100975613,130107546425,130108026864,130131307133,130134696722,130136267001,130137413257,130137839451,130138291041);
run;
I’m sure there must be a better way.
Since I’m only familiar with R I would write something like:
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))
I guess there must be something like the $ also for sas?
To me it looks like the most direct translation of the R code
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))
Would be to use SQL code
proc sql;
create table import_dup2 as
select * from import
where VTNR in (select VTNR from import_dup1)
;
quit;
But if your intent is to find the observations in IMPORT that have more than one observation per VTNR value there is no need to first create some other table.
data import_dup2 ;
set import;
by VTNR ;
if not (first.VTNR and last.VTNR);
run;
I would use the options in PROC SORT.
Make sure to specify an OUT= dataset otherwise you'll overwrite your original data.
/*Generate fake data with dups*/
data class;
set sashelp.class sashelp.class(obs=5);
run;
/*Create unique and dup dataset*/
proc sort data=class nouniquekey uniqueout=uniquerecs out=dups;
by name;
run;
/*Display results - for demo*/
proc print data=uniquerecs;
title 'Unique Records';
run;
proc print data=dups;
title 'Duplicate Records';
run;
Above solution can give you duplicates but not unique values. There are many possible ways to do both in SAS. Very easy to understand would be a SQL solution.
proc sql;
create table no_duplicates as
select *
from import
group by VTNR
having count(*) = 1
;
create table all_duplicates as
select *
from import
group by VTNR
having count(*) > 1
;
quit;
I would use Reeza's or Tom's solution, but for completeness, the solution most similar to R (and your preexisting code) would be three steps. Again, I wouldn't use this here, it's excess work for something you can do more easily, but the concept is helpful in other situations.
First, get the dataset of duplicates - either her method, or proc sort.
proc sort nodupkey data=have out=nodups dupout=dups;
by byvar;
run;
Then pull those into a macro list:
proc sql;
select byvar
into :duplist separated by ','
from dups;
quit;
Then you have them in &duplist. and can use them like so:
data want;
set have;
if not (byvar in &duplist.);
run;
data want;
set import;
where VTNR in import_dup1;
run;

SAS Proc SQL Trim not working?

I have a problem that seems pretty simple (probably is...) but I can't get it to work.
The variable 'name' in the dataset 'list' has a length of 20. I wish to conditionally select values into a macro variable, but often the desired value is less than the assigned length. This leaves trailing blanks at the end, which I cannot have as they disrupt future calls of the macro variable.
I've tried trim, compress, btrim, left(trim, and other solutions but nothing seems to give me what I want (which is 'Joe' with no blanks). This seems like it should be easier than it is..... Help.
data list;
length id 8 name $20;
input id name $;
cards;
1 reallylongname
2 Joe
;
run;
proc sql;
select trim(name) into :nameselected
from list
where id=2;
run;
%put ....&nameselected....;
Actually, there is an option, TRIMMED, to do what you want.
proc sql noprint;
select name into :nameselected TRIMMED
from list
where id=2;
quit;
Also, end PROC SQL with QUIT;, not RUN;.
It works if you specify a separator:
proc sql;
select trim(name) into :nameselected separated by ''
from list
where id=2;
run;

SAS - Creating variables from macro variables

I have a SAS dataset which has 20 character variables, all of which are names (e.g. Adam, Bob, Cathy etc..)
I would like a dynamic code to create variables called Adam_ref, Bob_ref etc.. which will work even if there a different dataset with different names (i.e. don't want to manually define each variable).
So far my approach has been to use proc contents to get all variable names and then use a macro to create macro variables Adam_ref, Bob_ref etc..
How do I create actual variables within the dataset from here? Do I need a different approach?
proc contents data=work.names
out=contents noprint;
run;
proc sort data = contents; by varnum; run;
data contents1;
set contents;
Name_Ref = compress(Name||"_Ref");
call symput (NAME, NAME_Ref);
%put _user_;
run;
If you want to create an empty dataset that has variables named like some values you have in a macro variables you could do something like this.
Save the values into macro variables that are named by some pattern, like v1, v2 ...
proc sql;
select compress(Name||"_Ref") into :v1-:v20 from contents;
quit;
If you don't know how many values there are, you have to count them first, I assumed there are only 20 of them.
Then, if all your variables are character variables of length 100, you create a dataset like this:
%macro create_dataset;
data want;
length %do i=1 %to 20; &&v&i $100 %end;
;
stop;
run;
%mend;
%create_dataset; run;
This is how you can do it if you have the values in macro variable, there is probably a better way to do it in general.
If you don't want to create an empty dataset but only change the variable names, you can do it like this:
proc sql;
select name into :v1-:v20 from contents;
quit;
%macro rename_dataset;
data new_names;
set have(rename=(%do i=1 %to 20; &&v&i = &&v&i.._ref %end;));
run;
%mend;
%rename_dataset; run;
You can use PROC TRANSPOSE with an ID statement.
This step creates an example dataset:
data names;
harry="sally";
dick="gordon";
joe="schmoe";
run;
This step is essentially a copy of your step above that produces a dataset of column names. I will reuse the dataset namerefs throughout.
proc contents data=names out=namerefs noprint;
run;
This step adds the "_Refs" to the names defined before and drops everything else. The variable "name" comes from the column attributes of the dataset output by PROC CONTENTS.
data namerefs;
set namerefs (keep=name);
name=compress(name||"_Ref");
run;
This step produces an empty dataset with the desired columns. The variable "name" is again obtained by looking at column attributes. You might get a harmless warning in the GUI if you try to view the dataset, but you can otherwise use it as you wish and you can confirm that it has the desired output.
proc transpose out=namerefs(drop=_name_) data=namerefs;
id name;
run;
Here is another approach which requires less coding. It does not require running proc contents, does not require knowing the number of variables, nor creating a macro function. It also can be extended to do some additional things.
Step 1 is to use built-in dictionary views to get the desired variable names. The appropriate view for this is dictionary.columns, which has alias of sashelp.vcolumn. The dictionary libref can be used only in proc sql, while th sashelp alias can be used anywhere. I tend to use sashelp alias since I work in windows with DMS and can always interactively view the sashelp library.
proc sql;
select compress(Name||"_Ref") into :name_list
separated by ' '
from sashelp.vcolumn
where libname = 'WORK'
and memname = 'NAMES';
quit;
This produces a space delimited macro vaiable with the desired names.
Step 2 To build the empty data set then this code will work:
Data New ;
length &name_list ;
run ;
You can avoid assuming lengths or create populated dataset with new variable names by using a slightly more complicated select statement.
For example
select compress(Name)||"_Ref $")||compress(put(length,best.))
into :name_list
separated by ' '
will generate a macro variable which retains the previous length for each variable. This will work with no changes to step 2 above.
To create populated data set for use with rename dataset option, replace the select statement as follows:
select compress(Name)||"= "||compress(_Ref")
into :name_list
separated by ' '
Then replace the Step 2 code with the following:
Data New ;
set names (rename = ( &name_list)) ;
run ;