Proc means on all columns in sas - sas

I have a huge table with a lot of columns.
I have numeric columns and character columns.
I need the sum of each numeric column and the max of each character column.
I need this row in a new DB
Is there a way to do it without write all varaible names?

By default, PROC MEANS will analyse all numeric variables if you leave out the VAR statement.
PROC MEANS data = work.example SUM;
RUN;
As far as I know, if you try to include character variables in a PROC MEANS it will not execute (see here).
If it's numeric values in character variables you're looking to retrieve the MAX of, perhaps consider using an INPUT function to convert them to numeric variables.

Related

SAS Proc Freq - frequency of each category for multiple variables

How can I produce a table that has this kind of info for multiple variables:
VARIABLE COUNT PERCENT
U 51 94.4444
Y 3 5.5556
This is what SAS spits out into the listing output for all variables when I run this program:
ods output nlevels=nlevels1 OneWayFreqs=freq1 ;
proc freq data=sample nlevels ;
tables _character_ / out=outfreq1;
run;
In the outfreq1 table there is the info for just the last variable in the data set (table shown above) but not for all for the variables.
In the nlevels1 table there is info of how many categories each variable has but no frequency data.
What I want though is to output the frequency info for all the variables.
Does anybody know a way to do this without a macro/loop?
You basically have two options, which are sort-of-similar in the kinds of problems you'll have with them: use PROC TABULATE, which more naturally deals with multiple table output, or use the onewayfreqs output that you already call for.
The problem with doing that is that variables may be of different types, so it doesn't have one column with all of that information - it has a pair of columns for each variable, which obviously gets a bit ... messy. Even if your variables are all the same type, SAS can't assume that as a general rule, so it won't produce a nice neat thing for you.
What you can do, though, particularly if you are able to use the formatted values (either due to wanting to, or due to them being identical!), is coalesce them into one result.
For example, given your freq1 dataset from the above:
data freq1_out;
set freq1;
value = coalesce(of f_:);
keep table value frequency percent;
run;
That combines the F_ variables into one variable (as always only one is ever populated). If you can't use the F_ variables and need the original ones, you will have to make your own variable list using a macro variable list (or some other method, or just type the names all out) to use coalesce.
Finally, you could probably use PROC SQL to produce a fairly similar table, although I probably wouldn't do it without using the macro language. UNION ALL is a handy tool here; basically you have separate subqueries for each variable with a group by that variable, so
proc sql;
create table my_freqs as
select 'HEIGHT' as var, height, count(1) as count
from sashelp.class
group by 1,height
union all
select 'WEIGHT' as var, weight, count(1) as count
from sashelp.class
group by 1,weight
union all
select 'AGE' as var, age, count(1) as count
from sashelp.class
group by 1,age
;
quit;
That of course can be trivially macrotized to something like
proc sql;
create table my_freqs as
%freq(table=sashelp.class,var=height)
union all
%freq(table=sashelp.class,var=weight)
union all
%freq(table=sashelp.class,var=age)
;
quit;
or even further either with a list processing or a macro loop.

Probt in sas for column of values

Im looking do a probt for a column of values in sas not just one and to give two tailed p values.
I have the following code Id like to amend
data all_ssr;
x=.551447;
df=25;
p=(1-probt(abs(x),df))*2;
put p=;
run;
however I would like x to be a column of values within another file. I have tried work.ttest which is just a file of ttest values.
Many thanks
You need to use a set statement to access data from another SAS dataset.
data all_ssr;
set work.ttest; /*Dataset containing column of values*/
df=25;
p=(1-probt(abs(x),df))*2;
run;
Removing the put statement avoids clogging up the log.

Naming variable using _n_, a column for each iteration of a datastep

I need to declare a variable for each iteration of a datastep (for each n), but when I run the code, SAS will output only the last one variable declared, the greatest n.
It seems stupid declaring a variable for each row, but I need to achieve this result, I'm working on a dataset created by a proc freq, and I need a column for each group (each row of the dataset).
The result will be in a macro, so it has to be completely flexible.
proc freq data=&data noprint ;
table &group / out=frgroup;
run;
data group1;
set group (keep=&group count ) end=eof;
call symput('gr', _n_);
*REQUESTED code will go here;
run;
I tried these:
var&gr.=.;
call missing(var&gr.);
and a lot of other statement, but none worked.
Always the same result, the ds includes only var&gr where &gr is the maximum n.
It seems that the PDV is overwriting the new variable each iteration, but the name is different.
Please, include the result in a single datastep, or, at least, let the code take less time as possible.
Any idea on how can I achieve the requested result?
Thanks.
Macro variables don't work like you think they do. Any macro variable reference is resolved at compile time, so your call symput is changing the value of the macro variable after all the references have been resolved. The reason you are getting results where the &gr is the maximum n is because that is what &gr was as a result of the last time you ran the code.
If you know you can determine the maximum _n_, you can put the max value into a macro variable and declare an array like so:
Find max _n_ and assign value to maxn:
data _null_;
set have end=eof;
if eof then call symput('maxn',_n_);
run;
Create variables:
data want;
set have;
array var (&maxn);
run;
If you don't like proc transpose (if you need 3 columns you can always use it once for every column and then put together the outputs) what you ask can be done with arrays.
First thing you need to determine the number of groups (i.e. rows) in the input dataset and then define an array with dimension equal to that number.
Then the i-th element of your array can be recalled using _n_ as index.
In the following code &gr. contains the number of groups:
data group1;
set group;
array arr_counts(&gr.) var1-var&gr.;
arr_counts(_n_)= count;
run;
In SAS there're several methods to determine the number of obs in a dataset, my favorite is the following: (doesn't work with views)
data _null_;
if 0 then set group nobs=n;
call symputx('gr',n);
run;

SAS - Convert numeric values to character including dates

I have a need to combine two sas datasets having the same column names but one of the datasets will have a numeric value where the same name in the other dataset are character. I was thinking to evaluate each field with the %isnum function and based on this convert the number to character:
char_id = put(id, 7.) ;
drop id ;
rename char_id=id ;
What I need to know is how do I determine the length of the variable to use in the PUT and what would I do for date fields?
Sounds like you need to analyze your data and see how long things are. Use an obviously too long format (best32.) and then see how long the actual results are, or use max.
For date fields, you need to decide how you want your date fields to look.
date_c = put(date_n,date9.);
That would be the default, but there are literally hundreds of date formats you can choose from.
You can also use proc contents data=myDataStes out=VarDatasets; run; and you will get the list of variables with type, length, format, informat and so on.

subset a dataset when the variables in the dataset matches a variable list

I'm dealing with one data problem in sas.
I have one dateset including 1000 variables and 1000 records for each variable.
And I have another variable list which includes 100 variable names.
I'd like to subset the first dataset when the variable names in that dataset match the variable list.
I tried proc merge and proc sql, but cannot work it out.
Could any one help me out?
Thanks a lot
SAS keeps or drops variables with the conveniently named keywords 'keep' and 'drop'. PROC SQL can help you generate a list if you don't already have it in text format.
data want;
set have;
keep var1 var2 var3 var4;
run;
If you have the list of variables in dataset "vnames" with the variable "tokeep", you can do this:
proc sql;
select tokeep into :keeplist separated by ' ' from vnames;
quit;
data want;
set have;
keep &keeplist.;
run;
PROC SQL is taking the contents of 'tokeep' and instead of selecting them to a table or the screen, putting them in a space-delimited list inside a macro variable 'keeplist', which then is used as the arguments for the 'keep' statement.
Here you can find how to output a list of all the variable names of a dataset as another dataset. This will make it way easier to decide which of the big datasets you will use and which you will not (e.g. a left (or right) join of variable names, then look at the number of rows is at least the count of variables which you want to have).