Column total as an observation in a dataset in SAS - sas

I need a column a total as an observation.
Input Dataset Output Dataset
------------- --------------
data input; Name Mark
input name$ mark; a 10
datalines; b 20
a 10 c 30
b 20 Total 60
c 30
;
run;
The below code which I wrote is working fine.
data output;
set input end=eof;
tot + mark;
if eof then
do;
output;
name = 'Total';
mark = tot;
output;
end;
else output;
run;
Please suggest if there is any better way of doing this.

PROC REPORT is a good solution for doing this. This summarizes the entire report - other options give you the ability to summarize in groups.
proc report out=outds data=input nowd;
columns name mark;
define name/group;
define mark/analysis sum;
compute after;
name = "Total";
line "Total" mark.sum;
endcomp;
run;

Your code is fine in general, however the issue might be in terms of performance. If the input table is huge, you end up rewriting full table.
I'd suggest something like this:
proc sql;
delete from input where name = 'Total';
create table total as
select 'Total' as name length=8, sum(mark) as mark
from input
;
quit;
proc append base=input data=total;
run;
Here you are reading full table but writing only a single row to existing table.

Related

Calculate mean and std of a variable, in a datastep in SAS

I have a dataset where observations is student and then I have a variable for their test score. I need to standardize these scores like this :
newscore = (oldscore - mean of all scores) / std of all scores
So that I am thinking is using a Data Step where I create a new dataset with the 'newscore' added to each student. But I don't know how to calculate the mean and std of the entire dataset IN in the Data Step. I know I can just calculate it using proc means, and then manually type it it. But I need to do I a lot of times and maybe drop variables and other stuff. So I would like to be able to just calculate it in the same step.
Data example:
__VAR testscore newscore
Student1 5 x
Student2 8 x
Student3 5 x
Code I tried:
data new;
set old;
newscore=(oldscore-(mean of testscore))/(std of testscore)
run;
(Can't post any of the real data, can't remove it from the server)
How do I do this?
Method1: Efficient way of solving this problem is by using proc stdize . It will do the trick and you dont need to calculate mean and standard deviation for this.
data have;
input var $ testscore;
cards;
student1 5
student2 8
student3 5
;
run;
data have;
set have;
newscore = testscore;
run;
proc stdize data=have out=want;
var newscore;
run;
Method2: As you suggested taking out means and standard deviation from proc means, storing their value in a macro and using them in our calculation.
proc means data=have;
var testscore;
output out=have1 mean = m stddev=s;
run;
data _null_;
set have1;
call symputx("mean",m);
call symputx("std",s);
run;
data want;
set have;
newscore=(testscore-&mean.)/&std.;
run;
My output:
var testscore newscore
student1 5 -0.577350269
student2 8 1.1547005384
student3 5 -0.577350269
Let me know in case of any queries.
You should not try to do this in the data step. Do it with proc means. You don't need to type anything in, just grab the value in a dataset.
You don't provide enough to give complete code in the answer, but the basic idea.
proc means data=sashelp.class;
var height weight;
output out=class_stats mean= std= /autoname;
run;
data class;
if _n_=1 then set class_Stats; *copy in the values from class_Stats;
set sashelp.class;
height_norm = (height-height_mean)/(height_stddev);
weight_norm = (weight-weight_mean)/(weight_stddev);
run;
Alternately, just use PROC STDIZE which will do this for you.
proc stdize data=sashelp.class out=class_Std;
var height weight;
run;
If you want to achieve this via proc sql:
proc sql;
create table want as
select *, mean(oldscore) as mean ,std(oldscore) as sd
from have;
quit;
For other statistical functions in proc sql, see here: https://support.sas.com/kb/25/279.html

All values for only most recent occurrence

I am trying to extract all the Time occurrences for only the recent visit. Can someone help me with the code please.
Here is my data:
Obs Name Date Time
1 Bob 2017090 1305
2 Bob 2017090 1015
3 Bob 2017081 0810
4 Bob 2017072 0602
5 Tom 2017090 1300
6 Tom 2017090 1010
7 Tom 2017090 0805
8 Tom 2017072 0607
9 Joe 2017085 1309
10 Joe 2017081 0815
I need the output as:
Obs Name Date Time
1 Bob 2017090 1305,1015
2 Tom 2017090 1300,1010,0805
3 Joe 2017085 1309
Right now my code is designed to give me only one recent entry:
DATA OUT2;
SET INP1;
BY DATE;
IF FIRST.DATE THEN OUTPUT OUT2;
RETURN;
I would first sort the data by name and date. Then I would transpose and process the results.
proc sort data=have;
by name date;
run;
proc transpose data=have out=temp1;
by name date;
var value;
run;
data want;
set temp1;
by name date;
if last.name;
format value $2000.;
value = catx(',',of col:);
drop col: _name_;
run;
You may want to further process the new VALUE to remove excess commas (,) and missing value .'s.
Very similar to the question yesterday from another user, you can use quite a few solutions here.
SQL again is the easiest; this is not valid ANSI SQL and pretty much only SAS supports this, but it does work in SAS:
proc sql;
select name, date, time
from have
group by name
having date=max(date);
quit;
Even though date and time are not on the group by it's legal in SAS to put them on the select, and then SAS automatically merges (inner joins) the result of select name, max(date) from have group by name having date=max(date) to the original have dataset, returning multiple rows as needed. Then you'd want to collapse the rows, which I leave as an exercise for the reader.
You could also simply generate a table of maximum dates using any method you choose and then merge yourself. This is probably the easiest in practice to use, in particular including troubleshooting.
The DoW loop also appeals here. This is basically the precise SAS data step implementation of the SQL above. First iterate over that name, figure out the max, then iterate again and output the ones with that max.
proc sort data=have;
by name date;
run;
data want;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then output;
end;
run;
Of course here you more easily collapse the rows, too:
data want;
length timelist $1024;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then timelist=catx(',',timelist,time);
if last.name then output;
end;
run;
If the data is sorted then just retain the first date so you know which records to combine and output.
proc sort data=have ;
by name descending date time;
run;
data want ;
set have ;
by name descending date ;
length timex $200 ;
retain start timex;
if first.name then do;
start=date;
timex=' ';
end;
if date=start then do;
timex=catx(',',timex,time);
if last.date then do;
output;
call missing(start,timex);
end;
end;
drop start time ;
rename timex=time ;
run;

How to rename total count across class variable in Proc Means

I'm doing a simple count of occurrences of a by-variable within a class variable, but cannot find a way to rename the total count across class variables. At the moment, the output dataset includes counts for all cluster2 within each group as well as the total count across all groups (i.e. the class variable used). However, the counts within classes are named, while the total is shown by an empty string.
Code:
proc means data=seeds noprint;
class group;
by cluster2;
id label2;
output out=seeds_counts (drop= _type_ _freq_) n(id)=count;
run;
Example of output file:
cluster2 group label2 count
7 area 1 20
7 sa area 1 15
7 sb area 1 5
15 area 15 42
15 sa area 15 18
....
Naturally, renaming the emtpy string to "Total" could be accomplished in a separate datastep, but I would like to do it directly in the Proc Means-step. It should be simple and trivial, but I haven't found a way so far. Afterwards, I want to transpose the dataset, which means that the emtpy string has to be changed, or it will be dropped in the proc transpose.
I don't know of a way to do it directly, but you can sort-of-cheat: you can tell SAS to show "Total" instead of missing.
proc format;
value $MissTotalF
' ' = 'Total'
other = [$CHAR12.];
quit;
proc means data=sashelp.class noprint;
class sex;
id age;
output out=sex_counts (drop= _type_ _freq_) n(age)=count;
format sex $MissTotalF.;
run;
For example. I'd also recommend using PROC TABULATE instead of PROC MEANS if you're just going for counts, though in this case it doesn't really make much difference.
The problem here is that if the variable in the class statement is numeric, then the resultant column will be numeric, therefore you can't add the word Total (unless you use a format, similar to the answer from #Joe). This will be why the value is missing, as the class variable can be either numeric or character.
Here's an example of a numeric class variable.
proc sort data=sashelp.class out=class;
by sex;
run;
proc means data=class noprint;
class age;
by sex;
output out=class_counts (drop= _:) n=count;
run;
Using proc tabulate can display the result pretty much how you want it, however the output dataset will have the same missing values, so won't really help. Here's a couple of examples.
proc tabulate data=class out=class_tabulate1 (drop=_:);
class sex age;
table sex*(age all='Total'),n='';
run;
proc tabulate data=class out=class_tabulate2 (drop=_:);
class sex age;
table sex,age*n='' all='Total';
run;
I think the best option to achieve your final goal is to add the nway option to proc means, which will remove the subtotals, then transpose the data and finally write a data step that creates the Total column by summing each row. It's 3 steps, but doesn't involve much coding.
Here is one method you could use by taking advantage of the _TYPE_ variable so that you can process the totals and details separately. You will still have trouble with PROC TRANSPOSE if there is a class with missing values (separate from the overall summary record).
proc means data=sashelp.class noprint;
class sex;
id age;
output out=sex_counts (drop= _freq_ ) n(age)=count;
run;
proc transpose data=sex_counts out=transpose prefix=count_ ;
where _type_=1 ;
id sex ;
var count;
run;
data transpose ;
merge transpose sex_counts(where=(_type_=0) keep=_type_ count);
rename count=count_Total;
drop _type_;
run;

Summing values by character in SAS

I created this fakedata as an example:
data fakedata;
length name $5;
infile datalines;
input name count percent;
return;
datalines;
Ania 1 17
Basia 1 3
Ola 1 10
Basia 1 52
Basia 1 2
Basia 1 16
;
run;
The result I want to have is:
---> summed counts and percents for Basia
I would like to have summed count and percent for Basia as she was only once in the table with count 4 and percent 83. I tried exchanging name into a number to do GROUP BY in proc sql but it changes into order by (I had such an error). Suppose that it isn't so difficult, but I can't find the solution. I also tried some arrays without any success. Any help appreciated!
It sounds like proc sql does what you want:
proc sql;
select name, count(*) as cnt, sum(percent) as sum_percent
from fakedata
group by name;
You can add a where clause to get the results just for one name.
Hm, actually I got an answer.
proc summary data=fakedata;
by name;
var count percent;
output out=wynik (drop = _FREQ_ _TYPE_) sum(count)=count sum(percent)=percent;
run;
You can go back a step and use PROC FREQ most likely to generate this output in a single step. Based on counts the percents are not correct, but I'm not sure they're intended to be, right now they add up to over 100%. If you already have some summaries, then use the WEIGHT statement to account for the counts.
proc freq data=fakedata;
table name;
weight count;
run;

Adding a column calculated from subset of another column

I have a SAS dataset similar to the one created here.
data have;
input date :date. count;
cards;
20APR2012 10
20APR2012 20
20APR2012 20
27APR2012 15
27APR2012 5
;
run;
proc sort data=have;
by date;
run;
I want to create a column containing the sum for each date, so it would look like
date total
20APR2012 50
27APR2012 20
I have tried using first. but I think my syntax is off. Thanks.
This is what proc means is for.
proc means data=have;
class date;
var count;
output out=want sum=total;
run;
The code below works to give you your desired result.
proc sql;
create table wanted_tab as
select
date format date9.,
sum(count) as Total
from have
group by date;
;
quit;