Proc Freq and unique id values - sas

I have a table with two columns, ID and Gender as below
I am trying to count the number of males and females. I wrote a code like this
Proc Freq data=Work.Test1;
tables gender;
run;
The output i got was 5males and 2 females, I know this is wrong because Id repeats many times , there are only 2 males and 1 female. My question is how do i change Proc Freq so that I get the count for gender (males and Females) for unique Id values ?

You can use Nlevels in proc freq
Proc freq data= yourdata NLEVELS;
tables gender /noprint;
run;

I'm not sure if this is easy to do without using SQL or data step to work it out.
proc sql;
create table want as
select gender, count(distinct id) as count
from have
group by gender;
quit;
or (sorted by gender id)
data want;
set have;
by gender id;
if first.gender then count=0;
if first.id then count+1;
if last.gender then output;
run;
PROC TABULATE might be able to do what you want, but I couldn't figure a quick method out.

Try this:
proc sort data=have out=want nodupkey;
by id gender;
proc freq data=want;
tables gender;
run;
This will give you one record per ID/gender, then you can run your freq for gender.

Related

Overlay the average trend on group by trends using Proc sgplot

I want to create a line graph that includes the overall trend of a disease rate and the specific trends for males and females. I use the following code for to create the group by trends. How to add he average trend to this line graph. Thanks for your help.
proc sgplot data=have ;
vline year/response=disease_rate group=sex stat=mean datalabel=disease_rate ;
yaxis values=(0,1) label="Percentage";
run;
Here's an example of summarizing it and then displaying it on the graph. There are more than one way to do this though, this is just one.
data have;
set sashelp.heart(in=a);
year=round(2021-ageAtStart, 10);
disease_rate= status="Dead";
run;
proc means data=have mean noprint;
class sex year;
types sex sex*year;
var disease_rate;
output out=summary_stats mean=average_value;
run;
proc sort data=summary_stats;
by sex year;
run;
data graph_data;
merge summary_stats(where=(_type_=2) rename=average_value=mean_sex_year)
summary_stats(where=(_type_=3) rename=average_value = mean_sex);
by sex;
format mean_sex: percent12.1;
run;
proc sgplot data=graph_data ;
*where year > 1990;
vline year/response=mean_sex_year group=sex stat=mean datalabel=mean_sex_year ;
vline year/response=mean_sex group=sex stat=mean datalabel=mean_sex ;
run;
Use series instead of vline so that you can overlay a regression on top of it to get an average trend line. For example:
proc sql;
create table have as
select date
, region
, sum(sale) as sale
from sashelp.pricedata
group by region, date
order by region, date
;
quit;
proc sgplot data=have;
series x=date y=sale / group=region;
reg x=date y=sale / group=region;
xaxis fitpolicy=rotatethin;
run;

Average number of rows per variable in SAS

I have the following dataset :
data test;
input business_ID $;
datalines;
'busi1'
'busi1'
'busi1'
'busi2'
'busi3'
'busi3'
;
run;
proc freq data = test ;
table business_ID;
run;
I would like the average nummber of lines per business, that is count the total number of observations and divide it by the number of distinct businesses.
In my example : 6 observations, 3 businesses -> 6/2=3 lines per business.
I was thinking about using a proc freq or a proc mean step but so far I got only the number of lines (~freq) per business and do not know how to get to my goal.
Any idea?
You could use PROC FREQ to get the counts and then run PROC MEANS on the output.
proc freq data=test ;
tables business_id / noprint out=counts ;
run;
proc means data=counts;
var count;
run;
Or you could count them directly with PROC SQL code.
proc sql ;
select count(*)/count(distinct business_id) as mean_count
from test
;
quit;

How do I put conditions around Proc Freq statements in SAS?

I have the following statement
Proc Freq data =test;
tables gender;
run;
I want this to generate an output based on a condition applied to the gender variable. For example - if count of gender greater than 2 then output.
How can I do this in SAS?
Thanks
If you mean an output dataset, you can put a where clause directly in the output dataset options.
Proc Freq data =sashelp.class;
tables sex/out=sex_freq(where=(count>9));
run;
I'm not aware of how you can accomplish this only using proc freq but you can redirect the output to a data set and then print the results.
proc freq data=test;
tables gender / noprint out=tmp;
run;
proc print data=tmp;
where count > 2;
run;
Alternatively you could use proc summary, but this still requires two steps.
proc summary data=test nway;
class gender;
output out=tmp(where=(_freq_ > 2));
run;
proc print data=tmp;
run;

Adding a column calculated from subset of another column

I have a SAS dataset similar to the one created here.
data have;
input date :date. count;
cards;
20APR2012 10
20APR2012 20
20APR2012 20
27APR2012 15
27APR2012 5
;
run;
proc sort data=have;
by date;
run;
I want to create a column containing the sum for each date, so it would look like
date total
20APR2012 50
27APR2012 20
I have tried using first. but I think my syntax is off. Thanks.
This is what proc means is for.
proc means data=have;
class date;
var count;
output out=want sum=total;
run;
The code below works to give you your desired result.
proc sql;
create table wanted_tab as
select
date format date9.,
sum(count) as Total
from have
group by date;
;
quit;

group by in sas

I've the below dataset as input
ID
--
1
2
2
3
4
4
4
5
And need a new dataset as below
ID count of ID
-- -----------
1 1
2 2
3 1
4 3
5 1
Could you please tell how to do this in SAS wihtout using PROC SQL?
or how about Proc Freq or Proc Summary? These avoid having to presort the data.
proc freq data=have noprint;
table id / out=want1 (drop=percent);
run;
proc summary data=have nway;
class id;
output out=want2 (drop=_type_);
run;
proc sql noprint;
create table test as select distinct id, count(id)
from your_table
group by ID
order by ID
;
quit;
Try this:
DATA Have;
input id ;
datalines;
1
2
2
3
4
4
4
5
;
Proc Sort data=Have;
by ID;
run;
Data Want;
Set Have;
By ID;
If first.ID then Count=0;
Count+1;
If Last.ID then Output;
Run;
PROC SORT DATA=YOURS NOPRINT;
BY ID; RUN;
PROC MEANS DATA=YOURS;
VAR ID;
BY ID;
OUTPUT OUT=NEWDATASET N=; RUN;
You can also choose to keep only the Id and N variables in your newdataset.
We can use simple PROC SQL count to do this:
proc sql;
create table want as
select id, count(id) as count_of_id
from have
group by id;
quit;
Here is yet another possibility, often known as a DoW construction:
Data want;
do count=1 by 1 until(last.ID);
set have;
by id;
end;
run;
If the aggregation you want to do is complex then go with PROC SQL only as we are more familiar with Group by in SQL
proc sql ;
create table solution_1 as select distinct ID, count(ID)
from table_1
group by ID
order by ID
;
quit;
OR
If you are using SAS- EG Query builders are very useful in small
analyses .
It's just drag & drop the columns u want to aggregate and in summary option Select whatever operation you want to perform like Avg,Count,miss,NMiss etc .