proc tabulate exclude missing for only some variables in class statement - sas

I am trying to create a mothly table where month among others is a class variable
Besides a summary of all months to years I want a summary up to and including the current moth i.e. in November I want a summary of January to November:
I Created a variable (kumpama) to tell which observations should be included in this summary variable each month:
By using two class statements and setting the missing option for all class variables except the summation variable I hoped to achieve the two summaries I wanted.
proc tabulate data=work.TabNRPab out=work.TabNRPab_out (rename= AntLgh_Sum=AntLgh) Format=numx13.;
var AntLgh;
class Huskat2 pabar upplatf2 pabman/preloadfmt missing;
class kumpama;
table (all='All Buildings' huskat2=' ')*(pabar=' ')*(upplatf2='') , (Pabman=' ' all='Year')*(antlgh=' ')*(sum=' ')
(kumpama='jan-sep')*(AntLgh=' ')*(sum=' ')
/printmiss misstext='.';
format upplatf2 Upplatelseform. Huskat2 $huskat2FT. pabman pabman.;
run;
The result is not what I expected. All values outside my target range (January to September) are now omitted. I know that by default an observation that contains a missing value for any class variable is excluded, but I though by using two class statements and apply the missing option to one of them I could come around this. The result and what I intend to do can both be seen in the first Picture since I can only post two links.
Probably I do something wrong or do I misunderstand the usage of the missing option?
Any suggestions or help would be appreciated.

The missing option is doing more-or-less the opposite of what you're trying to do. What it says is, "if there is a missing value in this variable, do NOT exclude the case". Missing only affects cases (rows): any case with a missing value in a class variable that is not using MISSING option will be excluded entirely.
What I would do here is to create a separate variable for the value you're summing only through September. Here's an example.
data have;
set sashelp.stocks;
if date < '01MAR2005'd then volume_pre01mar05 = volume;
run;
proc format;
picture million(round)
low-high = ' 000009.9' (mult=0.000001);
quit;
proc tabulate data=have;
class stock date;
var volume volume_pre01mar05;
where year(date)=2005;
tables stock,volume*date*sum=' ' volume*sum='Total'*format=million. volume_pre01mar05*sum='Through Feb 05'*format=million.;
run;
I have two volume variables: one that stores volume for all months, and one that stores volume for Jan-Feb and is missing for other months. (Missing in a var variable does not affect a row being included.) Then when I want to display the Jan/Feb sum, I tell SAS to sum that variable rather than the main Volume.

Related

Proc tabulate grouping Data - Three variables

I have three variable CONFIG, YEAR, TOT_SAL, i need all config in rows, years in columns and
based on values in rows and columns i need sum of third variable TOT_SAL;
I am so far trying this;
PROC TABULATE data=final OUT=work.final;
CLASS CONFIG YEAR;
TABLES CONFIG,YEAR;
Var TOT_SAL;
RUN;
This gives me cross tab for config and year but instead of frequency of config
i need SUM(TOT_SAL) in the cross tab.
Here's an example of how to do that. Since you didn't provide data I used the SASHELP.SHOES data set so this example can be replicated. If you need further assistance ensure to post actual sample data.
proc tabulate data=sashelp.shoes;
class region product;
var sales;
table region, product*(sales='')*(sum=''*f=dollar32.);
run;
The first and second examples in the SAS documentation shows another method as well as explaining each step in detail.
The simplest answer is adding the VAR statement. Note that you have tot_sal in the CLASS statement. That is incorrect, because the CLASS statement is intended for categorical/grouping variables, not variables to be summarized. Those go in the VAR statement instead.
PROC TABULATE data=final OUT=work.final;
CLASS CONFIG YEAR;
VAR TOT_SAL;
TABLES CONFIG, YEAR*TOTAL_SAL*(sum=''*f=dollar32.) ;
RUN;

special characters in alias Proc sql- SAS 9.3

I need to have a special character (% and space) in the alsias name of a proc sql statement.
proc sql DQUOTE=ANSI;
create table final_data as
select a.column1 as XYZ,
((a.colum2/b.colum2)-1) as "% VS LY"
from table1 a
join table2 b on a.colum3=b.colum3;
quit;
according to the documention, having the option proc sql DQUOTE=ANSI should work..
http://support.sas.com/documentation/cdl/en/acreldb/63647/HTML/default/viewer.htm#a001393333.htm
However, I'm getting this error in SAS 9.3
ERROR: The value % VS LY is not a valid SAS name.
What should I do to make this work?
Thank you so much in advance!
Perhaps a simpler solution would be to use standard naming and a SAS label. If the computed value is between 0 and 1 you can also add a SAS format.
((a.colum2/b.colum2)-1) as vs_ly_pct label='% VS LY' format=percent5.2
If you truly want non-standard column names, you will also need to set
options validvarname = any;
before the Proc SQL.
In SQL an alias is what you use to prefix variable references to tell which input table (or subquery) the variable comes from. Like the a and b in your query. What you are talking about is the variable NAME.
SAS variable names normally are restricted to underscore and alphanumeric characters (and cannot start with a number), but variable LABELS can be any string. You can just specify the label after the name.
select a.column1 as XYZ
, ((a.colum2/b.colum2)-1) as var2 '% VS LY'
Or use the SAS specific LABEL= syntax
select a.column1 as XYZ
, ((a.colum2/b.colum2)-1) as var2 label='% VS LY'

'notin' used within a 'where' statement syntax in PROC PROBIT

I'm working with some one else's code which contains the following instance of PROC PROBIT.
proc probit data = mortality order=data;
where group notin (9);
class survive;
model survive =log_dose / D = LOGISTIC INVERSECL;
ods output /*logprobitanalysis=logprobliv_dose*/ probitanalysis=probliv_dose;
RUN;
What function does the (9) serve in the where statement?
I'm scouring documentation, but not having much luck finding an explanation. Is it native to the where statement? Or, does the order= option alter the capabilities of where within proc probit? I assume that notin is a variable, but it's not entirely clear to me from the code. Is notin some obscure keyword for not in (list)?
(Un)Fortunately, the author is no longer with us.
NOTIN is the same as NOT IN. I assume SAS sees NOT and applies it as a modifier for what comes next, if what comes next is an operator.
So this works:
data test;
do group=1 to 9;
output;
end;
run;
data want;
set test;
where group notin (1,9);
run;
Leaving you with group in {2,3,...,8}

Proc GLM with a list of binary dummy variables

I am running a regression. My outcome (dependent) is a continuous variable. I have two types of independent variables. One represents day of week. The second type of independent variable is a binary variable (yes/no). I have about 40 of these binary variables. I am only interested in the interaction term between the day of week and all 40 binary variables in my model. I've searched online but could not find a great way to code it:
Sample Code:
proc glm
class dayofweek binvar1-binvar40
model outcome = dayofweek*binvar1 dayofweek*binvar2...dayofweek*binvar40/solution
run;
Is there an easier way to write this?
Not sure whether this counts as an easier solution :), but you can construct a macro variable IALL
DATA I;
DO i = 1 TO 40; OUTPUT; END;
RUN;
PROC SQL NOPRINT;
SELECT VAR into: IALL SEPARATED BY " " FROM (SELECT CATS("dayofweek*binvar",PUT(I,2.0)) AS VAR FROM I);
QUIT;
and use it in PROC GLM
proc glm
class dayofweek binvar1-binvar40
model outcome = &IALL. /solution
run;

Proc compare - comparing variables in two datasets that have different sizes and different variable placement

So, I have a significant problem with proc compare. I have two datasets with the two columns. One column lists table names and the other one - names of variables which correspond to table names from the first column. I want compare values of one of them based on the values of first column. I somewhat made it work but the thing is that these datasets have different sizes due to additional values in one of them. Which means that some new variable was added in the middle of a dataset (new variable was added to a table). Unfortunately, proc compare compares values from two datasets horizontally and checks them against each other for values, so in my case it looks like this:
ds 1 | ds 2
cost | box_nr
other | cost_total
As you can see, a new value box_nr was added to the second dataset that appears above the value that I want it to compare variable cost to (cost_total). So I would like to know if it's possible to compare values (check for differences in character sequence) that have at least minimal similarity - for example 3 letters (cos) or if it's possible to just put values like box_nr at the end suggesting that they don't appear in a certain dataset.
My code:
PROC Compare base=USERSPDS.MIzew compare=USERSPDS.MIwew
out=USERSPDS.result outbase outcomp outdif noprint;
id 'TABLE HD'n;
where ;
run;
proc print data=USERSPDS.result noobs;
by 'TABLE HD'n;
id 'TTABLE HD'n;
title 'COMPARISON:';
run;
Untested, but this should get you some of the way.
proc sql;
create table compare as
select
coalesce(a.cola, b.cola) as cola,
a.colb as acolb,
b.colb as bcolb
from dataa as a
full outer join datab as b
on
a.cola = b.cola and
compged(a.colb, b.colb) <= 100;
quit;
Have a look at the compged documentation for further information.
Sounds like you could make a new variable in both datasets, VAR3chars=substr(var,1,3) and then add that variable to your ID statement. I think that should work unless there are duplicate values.
So if one dataset had var="cost" and the other had var="cost_total", they would match on the id so they would be compared and found to be different.
If one dataset had var="box_nr" and the other did not have any values starting with "box", they would not match on the id so compare would find that a record exists for that id in one dataset but not the other.