SAS procedure to create new columns based on variable values - sas

I need to make a TOP 10 table with month by month history like:
The problem is I have only "melted" data:
Is there a SAS procedure to create new columns in a dataset based on a variable values like in my example?

PROC TRANSPOSE
proc sort data=have;
by type descending date;
run;
proc transpose data=have
out=want(drop=_NAME_)
;
by type;
id date;
var total;
run;

Your "melted" data is in a categorical form that is lock and load ready for a Proc TABULATE report.
Example:
data have;
input date yymmdd10. treatment: $10. count:;
format date yymmdd10.;
datalines;
2020-01-31 lolipops 5
2020-01-31 chocolate 4
2020-01-31 cakes 3
2020-01-31 cookies 2
2019-01-31 lolipops 2
2019-01-31 chocolate 3
2019-01-31 cakes 4
2019-01-31 cookies 5
2018-01-31 lolipops 3
2018-01-31 chocolate 4
2018-01-31 cakes 5
2018-01-31 cookies 6
;
ods html file='top10.html' style=plateau;
proc tabulate data=have;
class date / descending;
class treatment / order=freq;
var count;
table treatment='', date='' * count='' * sum='' * f=6.
/ box='Treat'
;
run;
ods html close;
Report

Related

How to transpose my data on sas by observation on data step

I have a sas datebase with something like this:
id birthday Date1 Date2
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
And I want the data in this form:
id Date Datetype
1 12/4/01 birthday
1 12/4/13 1
1 12/3/14 2
2 12/3/01 birthday
2 12/6/13 1
2 12/2/14 2
3 12/9/01 birthday
3 12/4/03 1
3 12/9/14 2
4 12/8/13 birthday
4 12/3/14 1
4 12/10/16 2
Thanks by ur help, i'm on my second week using sas <3
Edit: thanks by remain me that i was not finding a sorting method.
Good day. The following should be what you are after. I did not come up with an easy way to rename the columns as they are not in beginning data.
/*Data generation for ease of testing*/
data begin;
input id birthday $ Date1 $ Date2 $;
cards;
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
; run;
/*The trick here is to use date: The colon means everything beginning with date, comparae with sql 'date%'*/
proc transpose data= begin out=trans;
by id;
var birthday date: ;
run;
/*Cleanup. Renaming the columns as you wanted.*/
data trans;
set trans;
rename _NAME_= Datetype COL1= Date;
run;
See more from Kent University site
Two steps
Pivot the data using Proc TRANSPOSE.
Change the names of the output columns and their labels with PROC DATASETS
Sample code
proc transpose
data=have
out=want
( keep=id _label_ col1)
;
by id;
var birthday date1 date2;
label birthday='birthday' date1='1' date2='2' ; * Trick to force values seen in pivot;
run;
proc datasets noprint lib=work;
modify want;
rename
_label_ = Datetype
col1 = Date
;
label
Datetype = 'Datetype'
;
run;
The column order in the TRANSPOSE output table is:
id variables
copy variables
_name_ and _label_
data based column names
The sample 'want' shows the data named columns before the _label_ / _name_ columns. The only way to change the underlying column order is to rewrite the data set. You can change how that order is perceived when viewed is by using an additional data view, or an output Proc that allows you to specify the specific order desired.

sas relative frequencies by group

I have a categorical variable, say SALARY_GROUP, and a group variable, say COUNTRY. I would like to get the relative frequency of SALARY_GROUP within COUNTRY in SAS. Is it possible to get it by proc SUMMARY or proc means?
Perhaps explore proc tabulate and a counter variable?
Yes, You can calculate the relative frequency of a categorical variable using both Proc Means and Proc Summary. For both procs you have to:
-Specify NWAY in the proc statement,
-Specify in the Class statement your categorical fields,
-Specify in the Var statement your response or numeric field.
Example below is for proc means:
Dummy Data:
/*Dummy Data*/
data work.have;
input Country $ Salary_Group $ Value;
datalines;
USA Group1 100
USA Group1 100
GBR Group1 100
GBR Group1 100
USA Group2 20
USA Group2 20
GBR Group2 20
GBR Group1 100
;
run;
Code:
*Calculating Frequncy and saving output to table sg_means*/
proc means data=have n nway ;
class Country Salary_Group;
var Value;
output out=sg_means n=frequency;
run;
Output Table:
Country=GBR Salary_Group=Group1 _TYPE_=3 _FREQ_=3 frequency=3
Country=GBR Salary_Group=Group2 _TYPE_=3 _FREQ_=1 frequency=1
Country=USA Salary_Group=Group1 _TYPE_=3 _FREQ_=2 frequency=2
Country=USA Salary_Group=Group2 _TYPE_=3 _FREQ_=2 frequency=2

Classify records based on a date

I have the following dataset:
DATA survey;
informat order_date date9. ;
INPUT id order_date ;
DATALINES;
1 11SEPT20016
2 12AUG2016
3 14JAN2016
;
RUN;
PROC PRINT data = survey;
format order_date date9.;
RUN;
What I would like to do now is classify the records based on their last visit. So what I want to do is:
Set a date (fe, 10SEPT 2016)
Classify all records that have a lastvisit > 30days as 1, Classify all records that have a lastvisit > 60days as 2 etc...
Any thoughts on how I need to program this?
You could build something like this (count the days between the dates, divide them by 30 and ceil them). Alternativly, if you want to use months and not 30 days, you can replace the first intck parameter with 'month' and remove the ceil and /30:
DATA survey;
informat order_date date9. ;
INPUT id order_date ;
DATALINES;
1 11SEP2016
2 12AUG2016
3 14JAN2016
4 09SEP2016
5 10AUG2016
;
RUN;
%let lastvisit=10SEP2016;
data result;
set survey;
days_30=ceil(intck('days', order_date,"&lastvisit"d)/30)-1;
run;
PROC PRINT data = result;
format order_date date9.;
RUN;

Rolling up data in SAS

Here is my data :
data example;
input id sports_name;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
This is just a sample. The variable sports_name is categorical with 56 types.
I am trying to transpose the data to wide form where each row would have a user_id and the names of sports as the variables with values being 1/0 indicating Presence or absence.
So far, I used proc freq procedure to get the cross tabulated frequency table and put that in a different data set and then transposed that data. Now i have missing values in some cases and count of the sports in rest of the cases.
Is there any better way to do this?
Thanks!!
You need a way to create something from nothing. You could have also used the SPARSE option in PROC FREQ. SAS names cannot have length greater than 32.
data example;
input id sports_name :$16.;
retain y 1;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
;;;;
run;
proc print;
run;
proc summary data=example nway completetypes;
class id sports_name;
output out=freq(drop=_type_);
run;
proc print;
run;
proc transpose data=freq out=wide(drop=_name_);
by id;
var _freq_;
id sports_name;
run;
proc print;
run;
Same theory here, generate a list of all possible combinations using SQL instead of Proc Summary and then transposing the results.
data example;
informat sports_name $20.;
input id sports_name $;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
;
run;
proc sql;
create table complete as
select a.id, a_x.sports_name, case when not missing(e.sports_name) then 1 else 0 end as Present
from (select distinct ID from example) a
cross join (select distinct sports_name from example) a_x
full join example as e
on e.id=a.id
and e.sports_name=a_x.sports_name;
quit;
proc transpose data=complete out=want;
by id;
id sports_name;
var Present;
run;

Adding a column calculated from subset of another column

I have a SAS dataset similar to the one created here.
data have;
input date :date. count;
cards;
20APR2012 10
20APR2012 20
20APR2012 20
27APR2012 15
27APR2012 5
;
run;
proc sort data=have;
by date;
run;
I want to create a column containing the sum for each date, so it would look like
date total
20APR2012 50
27APR2012 20
I have tried using first. but I think my syntax is off. Thanks.
This is what proc means is for.
proc means data=have;
class date;
var count;
output out=want sum=total;
run;
The code below works to give you your desired result.
proc sql;
create table wanted_tab as
select
date format date9.,
sum(count) as Total
from have
group by date;
;
quit;