Proc tabulate grouping Data - Three variables - sas

I have three variable CONFIG, YEAR, TOT_SAL, i need all config in rows, years in columns and
based on values in rows and columns i need sum of third variable TOT_SAL;
I am so far trying this;
PROC TABULATE data=final OUT=work.final;
CLASS CONFIG YEAR;
TABLES CONFIG,YEAR;
Var TOT_SAL;
RUN;
This gives me cross tab for config and year but instead of frequency of config
i need SUM(TOT_SAL) in the cross tab.

Here's an example of how to do that. Since you didn't provide data I used the SASHELP.SHOES data set so this example can be replicated. If you need further assistance ensure to post actual sample data.
proc tabulate data=sashelp.shoes;
class region product;
var sales;
table region, product*(sales='')*(sum=''*f=dollar32.);
run;
The first and second examples in the SAS documentation shows another method as well as explaining each step in detail.
The simplest answer is adding the VAR statement. Note that you have tot_sal in the CLASS statement. That is incorrect, because the CLASS statement is intended for categorical/grouping variables, not variables to be summarized. Those go in the VAR statement instead.
PROC TABULATE data=final OUT=work.final;
CLASS CONFIG YEAR;
VAR TOT_SAL;
TABLES CONFIG, YEAR*TOTAL_SAL*(sum=''*f=dollar32.) ;
RUN;

Related

load and combine all SAS dataset

I have multiple SAS dataset in single location(folder) with two columns and name of the SAS dataset seems to be Diagnosis_<diagnosis_name>.
Here I want to load all dataset and combine all together like below,
Sample data set
File Location: C:\Users\xyz\Desktop\diagnosis\Diagnosis_<diagnosis_name>.sas7bdat
1. Dataset Name : Diagnosis_Diabetes.sas7bdat
2. Dataset Name : Diagnosis_Obesity.sas7bdat
Ouput which I expect like this
Could you please help me on this.
You can just combine the datasets using SET statement. If want all of the datasets with names that start with a constant prefix you can use the : wildcard to make a name list.
First create a libref to reference the directory:
libname diag 'C:\Users\xyz\Desktop\diagnosis\';
Then combine the datasets. If the original datasets are sorted by the PersonID then you can add a BY statement and the result will also be sorted.
data tall;
set diag.diagnosis_: ;
by person_id;
run;
If want to generate that wide dataset you could use PROC TRANSPOSE, but in that case you will need some extra variable to actually transpose.
data tall;
set diag.diagnosis_: ;
by person_id;
present=1;
run;
proc transpose data=tall out=want(drop=_name_);
by person_id;
id diagnosis;
var present;
run;

proc tabulate exclude missing for only some variables in class statement

I am trying to create a mothly table where month among others is a class variable
Besides a summary of all months to years I want a summary up to and including the current moth i.e. in November I want a summary of January to November:
I Created a variable (kumpama) to tell which observations should be included in this summary variable each month:
By using two class statements and setting the missing option for all class variables except the summation variable I hoped to achieve the two summaries I wanted.
proc tabulate data=work.TabNRPab out=work.TabNRPab_out (rename= AntLgh_Sum=AntLgh) Format=numx13.;
var AntLgh;
class Huskat2 pabar upplatf2 pabman/preloadfmt missing;
class kumpama;
table (all='All Buildings' huskat2=' ')*(pabar=' ')*(upplatf2='') , (Pabman=' ' all='Year')*(antlgh=' ')*(sum=' ')
(kumpama='jan-sep')*(AntLgh=' ')*(sum=' ')
/printmiss misstext='.';
format upplatf2 Upplatelseform. Huskat2 $huskat2FT. pabman pabman.;
run;
The result is not what I expected. All values outside my target range (January to September) are now omitted. I know that by default an observation that contains a missing value for any class variable is excluded, but I though by using two class statements and apply the missing option to one of them I could come around this. The result and what I intend to do can both be seen in the first Picture since I can only post two links.
Probably I do something wrong or do I misunderstand the usage of the missing option?
Any suggestions or help would be appreciated.
The missing option is doing more-or-less the opposite of what you're trying to do. What it says is, "if there is a missing value in this variable, do NOT exclude the case". Missing only affects cases (rows): any case with a missing value in a class variable that is not using MISSING option will be excluded entirely.
What I would do here is to create a separate variable for the value you're summing only through September. Here's an example.
data have;
set sashelp.stocks;
if date < '01MAR2005'd then volume_pre01mar05 = volume;
run;
proc format;
picture million(round)
low-high = ' 000009.9' (mult=0.000001);
quit;
proc tabulate data=have;
class stock date;
var volume volume_pre01mar05;
where year(date)=2005;
tables stock,volume*date*sum=' ' volume*sum='Total'*format=million. volume_pre01mar05*sum='Through Feb 05'*format=million.;
run;
I have two volume variables: one that stores volume for all months, and one that stores volume for Jan-Feb and is missing for other months. (Missing in a var variable does not affect a row being included.) Then when I want to display the Jan/Feb sum, I tell SAS to sum that variable rather than the main Volume.

Proc GLM with a list of binary dummy variables

I am running a regression. My outcome (dependent) is a continuous variable. I have two types of independent variables. One represents day of week. The second type of independent variable is a binary variable (yes/no). I have about 40 of these binary variables. I am only interested in the interaction term between the day of week and all 40 binary variables in my model. I've searched online but could not find a great way to code it:
Sample Code:
proc glm
class dayofweek binvar1-binvar40
model outcome = dayofweek*binvar1 dayofweek*binvar2...dayofweek*binvar40/solution
run;
Is there an easier way to write this?
Not sure whether this counts as an easier solution :), but you can construct a macro variable IALL
DATA I;
DO i = 1 TO 40; OUTPUT; END;
RUN;
PROC SQL NOPRINT;
SELECT VAR into: IALL SEPARATED BY " " FROM (SELECT CATS("dayofweek*binvar",PUT(I,2.0)) AS VAR FROM I);
QUIT;
and use it in PROC GLM
proc glm
class dayofweek binvar1-binvar40
model outcome = &IALL. /solution
run;

Using Tabulate for 3-way table

I am trying to output a three way frequency table. I am able to do this (roughly) with proc freq, but would like the control for variable to be joined. I thought proc tabulate would be a good way to customize the output. Basically I want to fill in the cells with frequency, and then customize the percents at a later time. So, have count and column percent in each cell. Is that doable with proc tabulate?
Right now I have:
proc freq data=have;
table group*age*level / norow nopercent;
run;
that gives me e.g.:
What I want:
Here is the code I am using:
proc tabulate data=ex1;
class age level group;
var age;
table age='Age Category',
mean=' '*group=''*level=''*F=10./ RTS=13.;
run;
Thanks!
You can certainly get close to that. You can't really get in 'one' cell, it needs to write each thing out to a different cell, but theoretically with some complex formatting (probably using CSS) you could remove the borders.
You can't use VAR and CLASS together, but since you're just doing percents, you don't need to use MEAN - you should just use N and COLPCTN. If you're dealing with already summarized data, you may need to do this differently - if so then post an example of your dataset (but that wouldn't work in PROC FREQ either without a FREQ statement).
data have;
do _t = 1 to 100;
age = ceil(3*rand('Uniform'));
group = floor(2*rand('Uniform'));
level = floor(5*rand('Uniform'));
output;
end;
drop _t;
run;
proc tabulate data=have;
class age level group;
table age='Age Category',
group=''*level=''*(n='n' colpctn='p')*F=10./ RTS=13.;
run;
This puts N and P (n and column %) in separate adjacent cells inside a single level.

SAS enterprise guide summing by personal ID

I have a dataset which has multiple obs per person. I want to have each single record showing the sum of a variable per person ID. However I do not want to group the data into single personal IDs. I hope the example below explains my question
I want to create the column in bold. How to do this? In SAS EG (or SAS if necessary)?
ID...Var1...SUM
X.....10.......30
X.....20.......30
Y.....20.......80
Y.....20.......80
Y.....40.......80
Z.....30.......30
You can do this using either proc sql or proc means
more info:proc means
proc sql
proc sql:
proc sql noprint;
create table new_table as
select distinct id, var1, sum(var_to_sum) as summed_var_name
from old_table
group by id
;
quit;
after rereading your question, using proc means you will need to merge var1 back in, better off using proc sql above.
proc means:
proc means data = old_table sum;
by id var1;
var var_to_sum;
output out = new_table sum;
run;