I want to ask a complicated (for me) question about SAS programming. I think I can explain better by using simple example. So, I have the following dataset:
Group Category
A 1
A 1
A 2
A 1
A 2
A 3
B 1
B 2
B 2
B 1
B 3
B 2
I want to count the each category for each group. I can do it by using PROC FREQ. But it is not better way for my dataset. It will be time consuming for me as my dataset is too large and I have a huge number of groups. So, if I use PROC FREQ, firstly I need to create new datasets for each group and then use PROC FREQ for each group. In sum, I need to create the following dataset:
CATEGORIES
Group 1 (first category) 2 3
A 3 2 1
B 2 3 1
So, the number of first category in group A is 3. The number of first category in group B is 2 and so on. I think I can explain it. Thanks for your helps.
There is more than one way to do this in SAS. My bias is proc sql, so:
proc sql;
select grp,
sum(case when category = 1 then 1 else 0 end) as cat_1,
sum(case when category = 2 then 1 else 0 end) as cat_2,
sum(case when category = 3 then 1 else 0 end) as cat_3
from t
group by grp;
Either proc freq or proc summary will do the job of producing frequency counts:
data example;
length group category $1;
input group category;
cards;
A 1
A 1
A 2
A 1
A 2
A 3
B 1
B 2
B 2
B 1
B 3
B 2
;
run;
proc freq data=example;
table group*category;
run;
proc summary data=example nway;
class group category;
output out=example_frequency (drop=_type_);
run;
proc summary will produce a dataset in a 'long' format. If you need to transpose it (I'd suggest not doing so: you'll probably find working with the long format easier in most circumstances) you can use proc transpose:
proc transpose data=example_frequency out=example_matrix (drop=_name_);
by group;
id category;
var _freq_;
run;
Related
I have a dataset with varying observations per ID, and these participants are also in different treatment status (Group). I wonder if I can use proc means to quickly calculate the number of participants and visits to clinic per group status by using proc means? Ideally, I can use proc means sum function quickly capture those with 0 and 1 based on group status and gain the total number? However, I got stuck in how to proceed.
ID Visit Group
1 1 0
1 2 0
2 1 1
2 2 1
2 3 1
3 1 0
4 1 1
4 2 1
5 1 0
5 2 0
6 1 1
6 2 1
6 3 1
6 4 1
Specifically, I am interested in 1) the total number of participants in each group status. In this case we can 3 participants (ID:1,3,and 5)in the control group (0) and another 3 participants (ID:2,4,and 6) in the treatment group (1).
2) the total number of visits per group status. In this case, the total visits in the control group (0) will be 5 (2+1+2=5) and the total visits in the treatment group (1) will be 9 (3+2+4=9).
I wonder if proc means procedure can help quickly calculate such values? Thanks.
Yes, you can use proc means to get counts.
data have;
input ID$ Visit Group;
cards;
1 1 0
1 2 0
2 1 1
2 2 1
2 3 1
3 1 0
4 1 1
4 2 1
5 1 0
5 2 0
;
run;
proc means data=have n;
class group id;
var visit;
types group id group*id;
run;
If you want the sum of visit, add "sum" behind proc means data=have n and ;.
It looks like GROUP is assigned at the ID level and not the ID/VISIT level. In that case if you want to count the number of ID's in each group you need to first get down to one observation per ID.
proc sort data=have nodupkey out=unique_ids ;
by id;
run;
Now you can count how many ID's are in each group. The normal way is to use PROC FREQ.
proc freq data=unique_ids;
tables group;
run;
But you can count with PROC MEANS/SUMMARY also.
proc summary data=unique_ids nway;
class group;
output out=counts N=N_ids ;
run;
proc print data=counts;
var group n_ids;
run;
MEANS doesn't do a distinct count easily so SQL may be a simpler to understand option here.
proc sql;
create table want as
select group, count(*) as num_visits, count(distinct ID) as num_participants
from have
group by group
order by 1;
quit;
My dataset and attempt
data mydata;
input Category $ Item $;
datalines;
A 1
A 1
A 2
B 3
B 1
;
proc sql;
create table mytable as
select *, count(Category) as Total_No_in_Category, count(Category)-count(item, "3") as No_of_not_3_in_the_same_category from mydata
group by Category;
run;
Result
Category No_of_not_3_in_the_same_category Total_No_in_Category
A 3 3
A 3 3
A 3 3
B 2 2
B 2 1
My expected result
Category No_of_not_3_in_the_same_ category Total_No_in_Category
A 2 3
B 1 2
I wonder how to achieve the expected result using only proc SQL. Thank you so much.
The two argument COUNT(item, "3") function call is not an summary function. That causes all rows from original table to be automatically remerged with the aggregate computation (those count()). The remerge is a proprietary feature of SAS Proc SQL and not part of the ANSI Standard for SQL.
You appear to want the number of unique non-3 item values, so you will need a
COUNT(DISTINCT ...expression...)
in the query. The ...expression... can be a case clause that transforms item="3" to a null value by not having an else part of the case clause.
Example:
create table want as
select
category
, count(*) as freq
, count(distinct case when item ne "3" then item end) as n_unq_item_not_3
from mydata
group by category
;
I want to do summation for each group and create a new variable for the sum for each group. I tried proc sql, but it only created a new variable.
My dataset looks like:
data have;
input firm year product$ value;
datalines;
1 2012 a 5
1 2012 a 6
1 2012 b 3
1 2013 a 4
1 2013 a 3
1 2013 b 4
1 2013 b 3
2 2012 a 5
2 2012 a 6
2 2012 b 3
2 2012 b 4
2 2012 b 2
2 2013 a 4
2 2013 a 5
2 2013 b 3
2 2013 b 3
;
run;
what I want is a table with four columns: firm year productA_sum productB_sum.
I tried this way:
proc sql;
create table h.want as
select a.*, sum(a.value) as sumvalue
from h.have as a
group by firm, year, product;
quit;
But it only create a new column.
because u group three variables, but in the select, you choose all variables. it will cause group by function useless.
/*Try this one*/
proc sql;
create table h.want as
select a.firm, a.year, a.product, sum(a.value) as sumvalue
from h.have as a
group by firm, year, product;
quit;
To get separate SUM() results based on another variable's value you need to use a CASE statement, not include it in the grouping variables.
proc sql;
create table want as
select firm, year
, sum(case when (product='a') then value else . end) as sum_product_A
, sum(case when (product='b') then value else . end) as sum_product_B
from have
group by firm,year
;
quit;
If you want the sum to be zero instead of missing if the product never appears then replace the missing values in the else clauses with 0 instead.
You are pivoting an aggregate sum. A two step approach could be more desirable if there are more than two product values to contend with.
proc summary data=have nway noprint;
class firm year product;
var value;
output out=class_sums sum=sum;
run;
proc transpose data=sums suffix=_sum out=want(drop=_name_);
by firm year;
id product;
var sum;
run;
I try to show all orders of Mode.
For example, I import excel like:
A
1
1
2
3
3
3
and code is :
ods select Modes;
proc univariate data=Want modes;
var A;
run;
this Result shows like:
Mode Count
3 3
I want to show like
Mode Count
3 3
1 2
2 1
how can I do that???
Your desired output is actually not modes. Modes returns most frequent value or values (if there is more than one with the same frequency) with the corresponding count. In your example, there is only one mode (3), as it is the value with the highest frequency. And that's what the result shows.
You may be interested in showing frequencies of every value present in variable A. In that case, you want to use this code:
ods select Frequencies;
proc univariate data=Want freq;
var A;
run;
That is a frequency table.
data have ;
input A ##;
cards;
1 1 2 3 3 3
;
proc freq data=have order=freq ;
tables a / out=counts;
run;
proc print data=counts;
run;
Result:
Obs A COUNT PERCENT
1 3 3 50.0000
2 1 2 33.3333
3 2 1 16.6667
I have following dataset:
ID Status
1 cake
1 cake
1 flower
2 flower
2 flower
3 cake
3 flower
4 cake
4 cake
4 cake
Basically, I am only interested in the observations that, grouped by the ID, include at least one flower. Also I want an indication of whether the observation grouped by ID only has flower or if it was cake too. E.g. I would ideally like something like:
ID Status Indicator
1 cake 1
1 cake 1
1 flower 1
2 flower 2
2 flower 2
3 cake 1
3 flower 1
4 cake 0
4 cake 0
4 cake 0
I have tried to subset the dataset in multiple ways and merge together, conditional on the ID, but it does not seem to be working.
This SAS data step based on your input (which I called test here) will return that indicator value by ID group.
proc sort data=test;
by ID descending status;
run;
data result(drop=status);
set test;
by ID;
retain indicator;
if first.ID then indicator=0;
if status='flower' and indicator=0 then indicator=2;
if status='cake' and indicator=2 then indicator=1;
if last.ID then output;
run;
You could join that result with the source data to get the result as you provided it in your post.
NOTE: I don't have enough reputation to comment on the answer provided by Gordon Linoff but I just want to point out that there the indicator will not take three values (0='no flower',1='cake+flower',2='only flower') but will instead be a count of the number of 'flower' entries per ID, which I don't think is quite what the poster is asking for.
Rewritten as follows will give the expected result with indicator values 0='no flower',1='only flower',2='cake+flower'
proc sql;
select t.*,
(count(distinct status))*(sum(case when status = 'flower' then 1 else 0 end)>0) as indicator
from test t
group by id;
;
quit;
proc sql comes to mind:
proc sql;
select t.*, tt.indicator
from t join
(select id, sum(case when status = 'flower' then 1 else 0 end) as indicator
from t
group by id
) tt
on tt.id = t.id;
proc sql also has a "remerge" extension to SQL. That allows you to do:
proc sql;
select t.*, tt.indicator,
sum(case when status = 'flower' then 1 else 0 end) as indicator
from t j
group by id;
If your data is already sorted by ID then you could use a double DOW loop. The first loop will check for the presence of the values. Then you can use another loop to write back all of the detail rows for that group.
data want ;
do until (last.id);
set have;
by id;
if status='flower' then _flower=1;
else if status='cake' then _cake=1;
end;
if _flower and _cake then indicator=1;
else if _flower then indicator=2;
else indicator=0;
do until (last.id);
set have;
by id;
output;
end;
run;
This should be fast assuming the data is already sorted.