proc tabulate missing values SAS - sas

I have the following code:
ods tagsets.excelxp file = 'G:\CPS\myworkwithoutmissing.xml'
style = printer;
proc tabulate data = final;
Class Year Self_Emp_Inc Self_Emp_Uninc Self_Emp Multi_Job P_Occupation Full_Part_Time_Status;
table Year, P_Occupation*n;
table Year, (P_Occupation*Self_Emp_Inc)*n;
table Year, (Self_Emp_Inc*P_Occupation)*n;
run;
ods tagsets.excelxp close;
When I run this code, I get the following error message:
WARNING: A class, frequency, or weight variable is missing on every observation.
WARNING: A class, frequency, or weight variable is missing on every observation.
WARNING: A class, frequency, or weight variable is missing on every observation.
Now in order to circumvent this issue, I add the "missing" option at the end of the class statement such that:
class year self_emp_inc ....... Full_Part_Time_Status/ missing;
This fixes the problem in that it doesn't give me the error message and creates the table. However, my chart now also counts the number of missing values, something that I do not want. For example my variable self_emp_inc has values of 1 and .(for missing). Now when I run the code with the missing option,I get a count of P_Occupation for all the missing values as well, but I only want the count for when the value of self_emp_Inc is 1. How can I accomplish that task?

This is one of those frustrating things in SAS that for some reason SAS hasn't given us a "good" option to work around. Depending on what you're working with, there are a few solutions.
The real problem here is not that you have missings - in a 1x1 table (1 var by 1 var), excluding missings is what you want. It's because you're calling for multiple tables and each table is affected by missings in the class variables in the other table.
As such, oftentimes the easiest answer is simply to split the tables into multiple proc tabulate statements. This might occasionally be too complicated or too onerous in terms of runtime, but I suspect the majority of the time this is the best solution - it often is for me, anyway.
Since you're only working with n, you could instead construct the tabulation with the missings, output to a dataset, then filter them out and re-print or export that dataset. That's the easiest solution, typically.
How exactly you want to do this of course depends on what exactly you want. For example:
data test_cars;
set sashelp.cars;
if _n_=5 then call missing(make);
if _n_=7 then call missing(model);
if _n_=10 then call missing(type);
if _n_=13 then call missing(origin);
run;
proc tabulate data=test_cars out=test_tabulate(rename=n=count);
class make model type origin/missing;
tables (make model type),origin*n;
run;
data test_tabulate_want;
set test_tabulate;
if cmiss(of make model type origin)>2 then delete;
length colvar $200;
colvar = coalescec(of make model type);
run;
proc tabulate data=test_tabulate_want missing;
class colvar origin/order=data;
var count;
tables colvar,origin*count*sum;
run;
This isn't perfect, though it can be made a lot better with some more work on the formatting - this is just a quick example.
If you're using percents, of course, this doesn't exactly work. You either need to refactor the percents in that data step - which is a bit of work, but doable - or you need separate tabulates for each class variable.

Related

Missing values in a FREQ (SAS)

I'm going to ask this with an example...
Suppose i have a data set where each observation represents a person. Two of the variables are AGE and HASADOG (and say this has values 1 for yes and 2 for no.) Is there a way to run a PROC FREQ (by AGE*HASADOG) that forces SAS to include in the report a line for instances where the count is zero?
By this I mean: if there is a particular value for AGE such that no observation with this AGE value has a 1 in the HASADOG variable, the report will still include a row for this combination (with a row percent of 0.)
Is this possible?
The SPARSE option in PROC FREQ is likely all you need.
proc freq data=sashelp.class;
table sex*age / sparse list;
run;
If the value is nowhere in your data set at all, then there's no way for SAS to know it exists. In this case you'd need a more complex solution, basically a way to tell SAS all values you would be using ahead of time. This can be done via a PRELOADFMT or CLASSDATA option on several procs. There are asked an answered questions on this topic here on SO, so I won't provide a solution for this option, which seems beyond the scope of your question.

SAS Proc Tabulate output dataset variable names

I am using proc tabulate to create an output dataset with statistics (n mean std min max p25 p75 median) for a variable with a long name (close to the 32 character maximum). The output dataset will add _n, _std, etc to our variable name, but the median variable is just named "Median" because the variable name with "_median" added to the end, the resulting variable name would be >32 characters.
Is there a way to specify the name of the variables in the output dataset from within the proc tabulate step? I am looping through 1000s of variable for this procedure, so it's not feasible to rename each variable in a data step. Also, it must be proc tabulate and not proc freq because we need to output a row for every possible value of each variable, not just those values that exist in the data.
proc tabulate data=DATA out=OUT ;
var VERY_LONG_VARIABLE_NAME;
table VERY_LONG_VARIABLE_NAME *(n mean std min max p25 p75 median)/printmiss;
run;
Unfortunately I don't know of a way to override the tabulate names. Even transposing the tabulate doesn't fix that - you still get the same result, sadly.
My suggestion is to use a different proc. Almost all of the procs you might use have a way to get what you want - the PRINTMISS equivalent; for example, PROC FREQ has the SPARSE option which does basically the same thing (despite its odd name), and PROC SUMMARY or PROC MEANS might be even better (with COMPLETETYPES on the class statement), just depending on your data.
Alternately, you could reshape your data, or reshape your process. For example, if you're really looping through thousands of variables, that's horribly inefficient; better would be to reshape to variable|value structure (vertical) and then do one proc tabulate; that would fix your issue right there (as it would make 'varname' be a CLASS or BY variable itself not a contributor to the output variable name) and make your process faster.
You could also add a VIEW step before the tabulate that performs the rename for you; that would cost very little even in a macro loop.
Either way, supply some sample data and an example of the total process you're doing and likely you can get a better answer.

SAS Proc Freq - frequency of each category for multiple variables

How can I produce a table that has this kind of info for multiple variables:
VARIABLE COUNT PERCENT
U 51 94.4444
Y 3 5.5556
This is what SAS spits out into the listing output for all variables when I run this program:
ods output nlevels=nlevels1 OneWayFreqs=freq1 ;
proc freq data=sample nlevels ;
tables _character_ / out=outfreq1;
run;
In the outfreq1 table there is the info for just the last variable in the data set (table shown above) but not for all for the variables.
In the nlevels1 table there is info of how many categories each variable has but no frequency data.
What I want though is to output the frequency info for all the variables.
Does anybody know a way to do this without a macro/loop?
You basically have two options, which are sort-of-similar in the kinds of problems you'll have with them: use PROC TABULATE, which more naturally deals with multiple table output, or use the onewayfreqs output that you already call for.
The problem with doing that is that variables may be of different types, so it doesn't have one column with all of that information - it has a pair of columns for each variable, which obviously gets a bit ... messy. Even if your variables are all the same type, SAS can't assume that as a general rule, so it won't produce a nice neat thing for you.
What you can do, though, particularly if you are able to use the formatted values (either due to wanting to, or due to them being identical!), is coalesce them into one result.
For example, given your freq1 dataset from the above:
data freq1_out;
set freq1;
value = coalesce(of f_:);
keep table value frequency percent;
run;
That combines the F_ variables into one variable (as always only one is ever populated). If you can't use the F_ variables and need the original ones, you will have to make your own variable list using a macro variable list (or some other method, or just type the names all out) to use coalesce.
Finally, you could probably use PROC SQL to produce a fairly similar table, although I probably wouldn't do it without using the macro language. UNION ALL is a handy tool here; basically you have separate subqueries for each variable with a group by that variable, so
proc sql;
create table my_freqs as
select 'HEIGHT' as var, height, count(1) as count
from sashelp.class
group by 1,height
union all
select 'WEIGHT' as var, weight, count(1) as count
from sashelp.class
group by 1,weight
union all
select 'AGE' as var, age, count(1) as count
from sashelp.class
group by 1,age
;
quit;
That of course can be trivially macrotized to something like
proc sql;
create table my_freqs as
%freq(table=sashelp.class,var=height)
union all
%freq(table=sashelp.class,var=weight)
union all
%freq(table=sashelp.class,var=age)
;
quit;
or even further either with a list processing or a macro loop.

error message box cox in proc transreg procedure in sas

I try to use the proc transreg procedure in SAS, to transform one of my variables in a dataset (var1). The var1 variable has values >=0.
My code is:
proc transreg data=data1 details;
model boxcox(var1/lambda=-1 to 1 by 0.125 convenient parameter=1)=identity(var2);
output out=BoxCox_Out;
run;
However I get the following error message:
"observation of nonblank TYPE not equal 'Score ' are excluded from the analysis and the output data set.
Could anyone help me?
_TYPE_ can be used for TRANSREG to allow you to take datasets with multiple kinds of rows and only use the SCORE rows (or whichever ones you choose), often outputs from earlier TRANSREG procedures.
However, _TYPE_ is also a common variable added by procedures like PROC MEANS to indicate which class combinations apply to the row. In this case, TRANSREG is getting confused and thinking you want something different.
Drop the _TYPE_ variable in the TRANSREG data source statement, and it should use all rows.
proc transreg data=data1(drop=_type_) details;

Sas macro with proc sql

I want to perform some regression and i would like to count the number of nonmissing observation for each variable. But i don't know yet which variable i will use. I've come up with the following solution which does not work. Any help?
Here basically I put each one of my explanatory variable in variable. For example
var1 var 2 -> w1 = var1, w2= var2. Notice that i don't know how many variable i have in advance so i leave room for ten variables.
Then store the potential variable using symput.
data _null_;
cntw=countw(&parameters);
i = 1;
array w{10} $15.;
do while(i <= cntw);
w[i]= scan((&parameters"),i, ' ');
i = i +1;
end;
/* store a variable globally*/
do j=1 to 10;
call symput("explanVar"||left(put(j,3.)), w(j));
end;
run;
My next step is to perform a proc sql using the variable i've stored. It does not work as
if I have less than 10 variables.
proc sql;
select count(&explanVar1), count(&explanVar2),
count(&explanVar3), count(&explanVar4),
count(&explanVar5), count(&explanVar6),
count(&explanVar7), count(&explanVar8),
count(&explanVar9), count(&explanVar10)
from estimation
;quit;
Can this code work with less than 10 variables?
You haven't provided the full context for this project, so it's unclear if this will work for you - but I think this is what I'd do.
First off, you're in SAS, use SAS where it's best - counting things. Instead of the PROC SQL and the data step, use PROC MEANS:
proc means data=estimation n;
var &parameters.;
run;
That, without any extra work, gets you the number of nonmissing values for all of your variables in one nice table.
Secondly, if there is a reason to do the PROC SQL, it's probably a bit more logical to structure it this way.
proc sql;
select
%do i = 1 %to %sysfunc(countw(&parameters.));
count(%scan(&parameters.,&i.) ) as Parameter_&i., /* or could reuse the %scan result to name this better*/
%end; count(1) as Total_Obs
from estimation;
quit;
The final Total Obs column is useful to simplify the code (dealing with the extra comma is mildly annoying). You could also put it at the start and prepend the commas.
You finally could also drive this from a dataset rather than a macro variable. I like that better, in general, as it's easier to deal with in a lot of ways. If your parameter list is in a data set somewhere (one parameter per row, in the dataset "Parameters", with "var" as the name of the column containing the parameter), you could do
proc sql;
select cats('%countme(var=',var,')') into :countlist separated by ','
from parameters;
quit;
%macro countme(var=);
count(&var.) as &var._count
%mend countme;
proc sql;
select &countlist from estimation;
quit;
This I like the best, as it is the simplest code and is very easy to modify. You could even drive it from a contents of estimation, if it's easy to determine what your potential parameters might be from that (or from dictionary.columns).
I'm not sure about your SAS macro, but the SQL query will work with these two notes:
1) If you don't follow your COUNT() functions with an identifier such as "COUNT() AS VAR1", your results will not have field headings. If that's ok with you, then you may not need to worry about it. But if you export the data, it will be helpful for you if you name them by adding "...AS "MY_NAME".
2) For observations with fewer than 10 variables, the query will return NULL values. So don't worry about not getting all of the results with what you have, because as long as the table you're querying has space for 10 variables (10 separate fields), you will get data back.