error message box cox in proc transreg procedure in sas - sas

I try to use the proc transreg procedure in SAS, to transform one of my variables in a dataset (var1). The var1 variable has values >=0.
My code is:
proc transreg data=data1 details;
model boxcox(var1/lambda=-1 to 1 by 0.125 convenient parameter=1)=identity(var2);
output out=BoxCox_Out;
run;
However I get the following error message:
"observation of nonblank TYPE not equal 'Score ' are excluded from the analysis and the output data set.
Could anyone help me?

_TYPE_ can be used for TRANSREG to allow you to take datasets with multiple kinds of rows and only use the SCORE rows (or whichever ones you choose), often outputs from earlier TRANSREG procedures.
However, _TYPE_ is also a common variable added by procedures like PROC MEANS to indicate which class combinations apply to the row. In this case, TRANSREG is getting confused and thinking you want something different.
Drop the _TYPE_ variable in the TRANSREG data source statement, and it should use all rows.
proc transreg data=data1(drop=_type_) details;

Related

SAS - Totaling values in a column by a different column's value

The title may be a little ambiguous. Essentially, using the SASHELP.SHOES dataset, I'm trying to summarize the data in a new table by totaling the Sales, and Returns for each region. For instance, instead of having 56 rows for shoes sold in Africa and their individual sales/returns values, I have one row for Africa with columns TotalSales and TotalReturns. I need to do this for each region in the original dataset.
I'm not familiar at all with SAS, this is more or less the first thing I've really had to program in it. I've tried a few variations of data steps with IN or WHERE conditions, proc means steps with SUM() statements, and DO/DO WHILE loops, but I've missed something each time.
In Proc MEANS
Use a CLASS statement to specify which variable(s) are to be used to group the data. In your case REGION.
Use the VAR statement to specify which variable(s) are to have statistics calculated for within each grouping.
Default output
Corresponding to the minimal syntax
ods listing;
proc means noprint data=SASHELP.SHOES;
class region;
var sales returns;
output out=shoes_stats;
run;
Creates data set WORK.SHOES_STATS with one row per statistic per region.
Other output structure
Use procedure option NWAY to only get summarizations for combinations of all the CLASS variables. (In your case this corresponds to rows with _TYPE_=1)
The output columns can have the statistic name automatically concatenated to the variable name using the OUTPUT statement option / autoname.
Use data set options to control variables that are kept or dropped.
proc means nway noprint data=SASHELP.SHOES;
class region;
var sales returns;
output out=shoes_sums(drop=_type_ _freq_) sum= / autoname;
run;
dm 'vt shoes_sums; column names' viewtable;

SAS Proc Tabulate output dataset variable names

I am using proc tabulate to create an output dataset with statistics (n mean std min max p25 p75 median) for a variable with a long name (close to the 32 character maximum). The output dataset will add _n, _std, etc to our variable name, but the median variable is just named "Median" because the variable name with "_median" added to the end, the resulting variable name would be >32 characters.
Is there a way to specify the name of the variables in the output dataset from within the proc tabulate step? I am looping through 1000s of variable for this procedure, so it's not feasible to rename each variable in a data step. Also, it must be proc tabulate and not proc freq because we need to output a row for every possible value of each variable, not just those values that exist in the data.
proc tabulate data=DATA out=OUT ;
var VERY_LONG_VARIABLE_NAME;
table VERY_LONG_VARIABLE_NAME *(n mean std min max p25 p75 median)/printmiss;
run;
Unfortunately I don't know of a way to override the tabulate names. Even transposing the tabulate doesn't fix that - you still get the same result, sadly.
My suggestion is to use a different proc. Almost all of the procs you might use have a way to get what you want - the PRINTMISS equivalent; for example, PROC FREQ has the SPARSE option which does basically the same thing (despite its odd name), and PROC SUMMARY or PROC MEANS might be even better (with COMPLETETYPES on the class statement), just depending on your data.
Alternately, you could reshape your data, or reshape your process. For example, if you're really looping through thousands of variables, that's horribly inefficient; better would be to reshape to variable|value structure (vertical) and then do one proc tabulate; that would fix your issue right there (as it would make 'varname' be a CLASS or BY variable itself not a contributor to the output variable name) and make your process faster.
You could also add a VIEW step before the tabulate that performs the rename for you; that would cost very little even in a macro loop.
Either way, supply some sample data and an example of the total process you're doing and likely you can get a better answer.

proc tabulate missing values SAS

I have the following code:
ods tagsets.excelxp file = 'G:\CPS\myworkwithoutmissing.xml'
style = printer;
proc tabulate data = final;
Class Year Self_Emp_Inc Self_Emp_Uninc Self_Emp Multi_Job P_Occupation Full_Part_Time_Status;
table Year, P_Occupation*n;
table Year, (P_Occupation*Self_Emp_Inc)*n;
table Year, (Self_Emp_Inc*P_Occupation)*n;
run;
ods tagsets.excelxp close;
When I run this code, I get the following error message:
WARNING: A class, frequency, or weight variable is missing on every observation.
WARNING: A class, frequency, or weight variable is missing on every observation.
WARNING: A class, frequency, or weight variable is missing on every observation.
Now in order to circumvent this issue, I add the "missing" option at the end of the class statement such that:
class year self_emp_inc ....... Full_Part_Time_Status/ missing;
This fixes the problem in that it doesn't give me the error message and creates the table. However, my chart now also counts the number of missing values, something that I do not want. For example my variable self_emp_inc has values of 1 and .(for missing). Now when I run the code with the missing option,I get a count of P_Occupation for all the missing values as well, but I only want the count for when the value of self_emp_Inc is 1. How can I accomplish that task?
This is one of those frustrating things in SAS that for some reason SAS hasn't given us a "good" option to work around. Depending on what you're working with, there are a few solutions.
The real problem here is not that you have missings - in a 1x1 table (1 var by 1 var), excluding missings is what you want. It's because you're calling for multiple tables and each table is affected by missings in the class variables in the other table.
As such, oftentimes the easiest answer is simply to split the tables into multiple proc tabulate statements. This might occasionally be too complicated or too onerous in terms of runtime, but I suspect the majority of the time this is the best solution - it often is for me, anyway.
Since you're only working with n, you could instead construct the tabulation with the missings, output to a dataset, then filter them out and re-print or export that dataset. That's the easiest solution, typically.
How exactly you want to do this of course depends on what exactly you want. For example:
data test_cars;
set sashelp.cars;
if _n_=5 then call missing(make);
if _n_=7 then call missing(model);
if _n_=10 then call missing(type);
if _n_=13 then call missing(origin);
run;
proc tabulate data=test_cars out=test_tabulate(rename=n=count);
class make model type origin/missing;
tables (make model type),origin*n;
run;
data test_tabulate_want;
set test_tabulate;
if cmiss(of make model type origin)>2 then delete;
length colvar $200;
colvar = coalescec(of make model type);
run;
proc tabulate data=test_tabulate_want missing;
class colvar origin/order=data;
var count;
tables colvar,origin*count*sum;
run;
This isn't perfect, though it can be made a lot better with some more work on the formatting - this is just a quick example.
If you're using percents, of course, this doesn't exactly work. You either need to refactor the percents in that data step - which is a bit of work, but doable - or you need separate tabulates for each class variable.

SAS Proc Freq - frequency of each category for multiple variables

How can I produce a table that has this kind of info for multiple variables:
VARIABLE COUNT PERCENT
U 51 94.4444
Y 3 5.5556
This is what SAS spits out into the listing output for all variables when I run this program:
ods output nlevels=nlevels1 OneWayFreqs=freq1 ;
proc freq data=sample nlevels ;
tables _character_ / out=outfreq1;
run;
In the outfreq1 table there is the info for just the last variable in the data set (table shown above) but not for all for the variables.
In the nlevels1 table there is info of how many categories each variable has but no frequency data.
What I want though is to output the frequency info for all the variables.
Does anybody know a way to do this without a macro/loop?
You basically have two options, which are sort-of-similar in the kinds of problems you'll have with them: use PROC TABULATE, which more naturally deals with multiple table output, or use the onewayfreqs output that you already call for.
The problem with doing that is that variables may be of different types, so it doesn't have one column with all of that information - it has a pair of columns for each variable, which obviously gets a bit ... messy. Even if your variables are all the same type, SAS can't assume that as a general rule, so it won't produce a nice neat thing for you.
What you can do, though, particularly if you are able to use the formatted values (either due to wanting to, or due to them being identical!), is coalesce them into one result.
For example, given your freq1 dataset from the above:
data freq1_out;
set freq1;
value = coalesce(of f_:);
keep table value frequency percent;
run;
That combines the F_ variables into one variable (as always only one is ever populated). If you can't use the F_ variables and need the original ones, you will have to make your own variable list using a macro variable list (or some other method, or just type the names all out) to use coalesce.
Finally, you could probably use PROC SQL to produce a fairly similar table, although I probably wouldn't do it without using the macro language. UNION ALL is a handy tool here; basically you have separate subqueries for each variable with a group by that variable, so
proc sql;
create table my_freqs as
select 'HEIGHT' as var, height, count(1) as count
from sashelp.class
group by 1,height
union all
select 'WEIGHT' as var, weight, count(1) as count
from sashelp.class
group by 1,weight
union all
select 'AGE' as var, age, count(1) as count
from sashelp.class
group by 1,age
;
quit;
That of course can be trivially macrotized to something like
proc sql;
create table my_freqs as
%freq(table=sashelp.class,var=height)
union all
%freq(table=sashelp.class,var=weight)
union all
%freq(table=sashelp.class,var=age)
;
quit;
or even further either with a list processing or a macro loop.

Running All Variables Through a Function in SAS

I am new to SAS and need to sgplot 112 variables. The variable names are all very different and may change over time. How can I call each variable in the statement without having to list all of them?
Here is what I have done so far:
%macro graph(var);
proc sgplot data=monthly;
series x=date y=var;
title 'var';
run;
%mend;
%graph(gdp);
%graph(lbr);
The above code can be a pain since I have to list 112 %graph() lines and then change the names in the future as the variable names change.
Thanks for the help in advance.
List processing is the concept you need to deal with something like this. You can also use BY group processing or in the case of graphing Paneling in some cases to approach this issue.
Create a dataset from a source convenient to you that contains the list of variables. This could be an excel or text file, or it could be created from your data if there's a way to programmatically tell which variables you need.
Then you can use any of a number of methods to produce this:
proc sql;
select cats('%graph(',var,')')
into: graphlist separated by ' '
from yourdata;
quit;
&graphlist
For example.
In your case, you could also generate a vertical dataset with one row per variable, which might be easier to determine which variables are correct:
data citiwk;
set sashelp.citiwk;
var='COM';
val=WSPCA;
output;
var='UTI';
val=WSPUA;
output;
var='INDU';
val=WSPIA;
output;
val=WSPGLT;
var='GOV';
output;
keep val var date;
run;
proc sort data=citiwk;
by var date;
run;
proc sgplot data=citiwk;
by var;
series x=date y=val;
run;
While I hardcoded those four, you could easily create an array and use VNAME() to get the variable name or VLABEL() to get the variable label of each array element.