Adding a column calculated from subset of another column

Adding a column calculated from subset of another column - sas

I have a SAS dataset similar to the one created here.
data have;
input date :date. count;
cards;
20APR2012 10
20APR2012 20
20APR2012 20
27APR2012 15
27APR2012 5
;
run;
proc sort data=have;
by date;
run;
I want to create a column containing the sum for each date, so it would look like
date total
20APR2012 50
27APR2012 20
I have tried using first. but I think my syntax is off. Thanks.

This is what proc means is for.
proc means data=have;
class date;
var count;
output out=want sum=total;
run;

The code below works to give you your desired result.
proc sql;
create table wanted_tab as
select
date format date9.,
sum(count) as Total
from have
group by date;
;
quit;

Related

Average number of rows per variable in SAS

I have the following dataset :
data test;
input business_ID $;
datalines;
'busi1'
'busi1'
'busi1'
'busi2'
'busi3'
'busi3'
;
run;
proc freq data = test ;
table business_ID;
run;
I would like the average nummber of lines per business, that is count the total number of observations and divide it by the number of distinct businesses.
In my example : 6 observations, 3 businesses -> 6/2=3 lines per business.
I was thinking about using a proc freq or a proc mean step but so far I got only the number of lines (~freq) per business and do not know how to get to my goal.
Any idea?

You could use PROC FREQ to get the counts and then run PROC MEANS on the output.
proc freq data=test ;
tables business_id / noprint out=counts ;
run;
proc means data=counts;
var count;
run;
Or you could count them directly with PROC SQL code.
proc sql ;
select count(*)/count(distinct business_id) as mean_count
from test
;
quit;

How to rename total count across class variable in Proc Means

I'm doing a simple count of occurrences of a by-variable within a class variable, but cannot find a way to rename the total count across class variables. At the moment, the output dataset includes counts for all cluster2 within each group as well as the total count across all groups (i.e. the class variable used). However, the counts within classes are named, while the total is shown by an empty string.
Code:
proc means data=seeds noprint;
class group;
by cluster2;
id label2;
output out=seeds_counts (drop= _type_ _freq_) n(id)=count;
run;
Example of output file:
cluster2 group label2 count
7 area 1 20
7 sa area 1 15
7 sb area 1 5
15 area 15 42
15 sa area 15 18
....
Naturally, renaming the emtpy string to "Total" could be accomplished in a separate datastep, but I would like to do it directly in the Proc Means-step. It should be simple and trivial, but I haven't found a way so far. Afterwards, I want to transpose the dataset, which means that the emtpy string has to be changed, or it will be dropped in the proc transpose.

I don't know of a way to do it directly, but you can sort-of-cheat: you can tell SAS to show "Total" instead of missing.
proc format;
value $MissTotalF
' ' = 'Total'
other = [$CHAR12.];
quit;
proc means data=sashelp.class noprint;
class sex;
id age;
output out=sex_counts (drop= _type_ _freq_) n(age)=count;
format sex $MissTotalF.;
run;
For example. I'd also recommend using PROC TABULATE instead of PROC MEANS if you're just going for counts, though in this case it doesn't really make much difference.

The problem here is that if the variable in the class statement is numeric, then the resultant column will be numeric, therefore you can't add the word Total (unless you use a format, similar to the answer from #Joe). This will be why the value is missing, as the class variable can be either numeric or character.
Here's an example of a numeric class variable.
proc sort data=sashelp.class out=class;
by sex;
run;
proc means data=class noprint;
class age;
by sex;
output out=class_counts (drop= _:) n=count;
run;
Using proc tabulate can display the result pretty much how you want it, however the output dataset will have the same missing values, so won't really help. Here's a couple of examples.
proc tabulate data=class out=class_tabulate1 (drop=_:);
class sex age;
table sex*(age all='Total'),n='';
run;
proc tabulate data=class out=class_tabulate2 (drop=_:);
class sex age;
table sex,age*n='' all='Total';
run;
I think the best option to achieve your final goal is to add the nway option to proc means, which will remove the subtotals, then transpose the data and finally write a data step that creates the Total column by summing each row. It's 3 steps, but doesn't involve much coding.

Here is one method you could use by taking advantage of the _TYPE_ variable so that you can process the totals and details separately. You will still have trouble with PROC TRANSPOSE if there is a class with missing values (separate from the overall summary record).
proc means data=sashelp.class noprint;
class sex;
id age;
output out=sex_counts (drop= _freq_ ) n(age)=count;
run;
proc transpose data=sex_counts out=transpose prefix=count_ ;
where _type_=1 ;
id sex ;
var count;
run;
data transpose ;
merge transpose sex_counts(where=(_type_=0) keep=_type_ count);
rename count=count_Total;
drop _type_;
run;

SAS Keep maximum value by ID

Each ID has several instances, and each instance has a different value. I would like the final output to be the maximum value per ID. So the initial dataset is:
ID Value
1 100
1 7
1 65
2 12
2 97
3 82
3 54
And the output will be:
ID Value
1 100
2 97
3 82
I tried running proc sort twice thinking that the first sort would get things in the proper order so that nodupkey on the second sort would get rid of the right values. This did not work.
proc sort work.data; by id value descending; run;
proc sort work.data nodupkey; by id; run;
Thanks!

Your approach should have worked fine but it looks like you have a syntax error - did you forget to check your log? The descending keyword needs to go before the variable you want to sort in descending order.
proc sort data=sashelp.class out=tmp;
by sex descending height;
run;
proc sort data=tmp out=final nodupkey;
by sex;
run;
Also - in case you're not familiar with SQL, I strongly suggest that you should learn it as it will simplify many data manipulation tasks. This can also be solved in a single SQL step:
proc sql noprint;
create table want as
select sex,
max(height) as height
from sashelp.class
group by sex
;
quit;

My preferred solution:
proc means data=have noprint;
class id;
var value;
output out=want max(value)=;
run;
Should be a lot faster than two sorts.

Column total as an observation in a dataset in SAS

I need a column a total as an observation.
Input Dataset Output Dataset
------------- --------------
data input; Name Mark
input name$ mark; a 10
datalines; b 20
a 10 c 30
b 20 Total 60
c 30
;
run;
The below code which I wrote is working fine.
data output;
set input end=eof;
tot + mark;
if eof then
do;
output;
name = 'Total';
mark = tot;
output;
end;
else output;
run;
Please suggest if there is any better way of doing this.

PROC REPORT is a good solution for doing this. This summarizes the entire report - other options give you the ability to summarize in groups.
proc report out=outds data=input nowd;
columns name mark;
define name/group;
define mark/analysis sum;
compute after;
name = "Total";
line "Total" mark.sum;
endcomp;
run;

Your code is fine in general, however the issue might be in terms of performance. If the input table is huge, you end up rewriting full table.
I'd suggest something like this:
proc sql;
delete from input where name = 'Total';
create table total as
select 'Total' as name length=8, sum(mark) as mark
from input
;
quit;
proc append base=input data=total;
run;
Here you are reading full table but writing only a single row to existing table.

group by in sas

I've the below dataset as input
ID
--
1
2
2
3
4
4
4
5
And need a new dataset as below
ID count of ID
-- -----------
1 1
2 2
3 1
4 3
5 1
Could you please tell how to do this in SAS wihtout using PROC SQL?

or how about Proc Freq or Proc Summary? These avoid having to presort the data.
proc freq data=have noprint;
table id / out=want1 (drop=percent);
run;
proc summary data=have nway;
class id;
output out=want2 (drop=_type_);
run;

proc sql noprint;
create table test as select distinct id, count(id)
from your_table
group by ID
order by ID
;
quit;

Try this:
DATA Have;
input id ;
datalines;
1
2
2
3
4
4
4
5
;
Proc Sort data=Have;
by ID;
run;
Data Want;
Set Have;
By ID;
If first.ID then Count=0;
Count+1;
If Last.ID then Output;
Run;

PROC SORT DATA=YOURS NOPRINT;
BY ID; RUN;
PROC MEANS DATA=YOURS;
VAR ID;
BY ID;
OUTPUT OUT=NEWDATASET N=; RUN;
You can also choose to keep only the Id and N variables in your newdataset.

We can use simple PROC SQL count to do this:
proc sql;
create table want as
select id, count(id) as count_of_id
from have
group by id;
quit;

Here is yet another possibility, often known as a DoW construction:
Data want;
do count=1 by 1 until(last.ID);
set have;
by id;
end;
run;

If the aggregation you want to do is complex then go with PROC SQL only as we are more familiar with Group by in SQL
proc sql ;
create table solution_1 as select distinct ID, count(ID)
from table_1
group by ID
order by ID
;
quit;
OR
If you are using SAS- EG Query builders are very useful in small
analyses .
It's just drag & drop the columns u want to aggregate and in summary option Select whatever operation you want to perform like Avg,Count,miss,NMiss etc .

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Adding a column calculated from subset of another column - sas

This is what proc means is for. proc means data=have; class date; var count; output out=want sum=total; run;

The code below works to give you your desired result. proc sql; create table wanted_tab as select date format date9., sum(count) as Total from have group by date; ; quit;

Related

Average number of rows per variable in SAS

How to rename total count across class variable in Proc Means

SAS Keep maximum value by ID

Column total as an observation in a dataset in SAS

group by in sas

Categories

Resources