I have this original dataset #1 ("HAVE"), which contains 2 teams playing for each record:
DATA HAVE;
INPUT DATE :yymmdd10. Home_Team $ Away_Team $ Home_Rate Away_Rate ;
format DATE yymmddd10.;
datalines;
2020-07-26 Arsenal Watford 2.0 3.6
2020-07-26 Burnley Brighton 2.6 2.8
;
RUN;
What I want is to use PROC TRANSPOSE so that each record only contains 1 team, i.e. ("WANT"):
DATA WANT;
INPUT DATE :yymmdd10. TEAM $ TYPE $ Rate;
format DATE yymmddd10.;
datalines;
2020-07-26 Arsenal HOME 2.0
2020-07-26 Watford AWAY 3.6
2020-07-26 Burnley HOME 2.6
2020-07-26 Brighton AWAY 2.8
;
RUN;
Currently I can use 2 data steps then UNION to get the results - just figure it can be more clean with the PROC TRANPOSE - trying the different combinations of BY/ID/VAR combinations but still cannot seem to get the right results.
So, the general approach here is the double transpose. First transpose so that your variables are rows, and then grab the TYPE from that (and the desired variable name). Then transpose again. Here's an example:
DATA HAVE;
INPUT DATE :yymmdd10. Home_Team $ Away_Team $ Home_Rate Away_Rate ;
format DATE yymmddd10.;
datalines;
2020-07-26 Arsenal Watford 2.0 3.6
2020-07-26 Burnley Brighton 2.6 2.8
;
RUN;
proc transpose data=have out=have_t;
by date;
var home: away:;
run;
data pre_t;
set have_t;
by date;
type = scan(_NAME_,1,'_');
varn = scan(_NAME_,2,'_');
run;
proc transpose data=pre_t out=want;
by date type notsorted;
id varn;
var col1 col2;
run;
Now, there's one downside to this - the values in Rate will now be character. You can avoid that if you want by doing the first transpose differently (either using a data step, or using two proc transposes one for the numerics and one for the characters), or you can just re-convert them in a later datastep.
Related
I was working on a SAS problem where I need to append the data. The data run is successful but it creates duplicates every time I run the program.
Please check my code and screenshot of the table:
Question: Create a new file "Total_Sales" by appending data file "Hyundai" with the file first created in problem 3.
/*Problem 3*/:
data avik1.var1;
length uniqueid $50 Manufacturer $ 50 Model $20 Sales_in_thousands 8 _4_year_resale_value 8 Price_in_thousands 8;
retain uniqueid Manufacturer Model Latest_Launch Sales_in_thousands _4_year_resale_value Price_in_thousands;
set avik1.conc(drop= Vehicle_type Engine_size Horsepower Wheelbase Width Length Curb_weight Fuel_capacity Fuel_efficiency );
informat Latest_Launch date9.;
format Latest_Launch ddmmyy10.;
run;
proc print data = avik1.var1;
run;
/* Data To be Appended */
data avik1.hyundai;
length uniqueid $ 50 Manufacturer $ 50 Model $20 Sales_in_thousands 8 _4_year_resale_value 8;
informat Latest_Launch date7. ;
format Latest_Launch ddmmyy10.;
input Manufacturer $ Model $ Sales_in_thousands _4_year_resale_value Latest_Launch;
uniqueid=(Model||Manufacturer);
cards;
Hyundai Tuscon 16.919 16.36 2Feb12
Hyundai i45 39.384 19.875 3Jun11
Hyundai Verna 14.114 18.225 4Jan12
Hyundai Terracan 8.558 29.775 10Mar11
;
run;
Proc Print data = avik1.hyundai;
run;
Now I used the following code to append:
data avik1.total_sales;
set avik1.var1 avik1.hyundai;
proc append base=avik1.var1 new=avik1.hyundai force;
run;
proc print data= avik1.total_sales;
run;
The program runs but gets me duplicates which you can check in the image
Screenshot in Yellow Mark Shows Duplicates
I am new to SAS really appreciate your response and solution to this problem. Also please tell me why this is happening.
Thanks!
Did you run it twice? I'm guessing but that could be the reason you see duplicates. I'll try to explain.
In your append code here, you are creating the new dataset total_sales by combining var1 and hyundai:
data avik1.total_sales;
set avik1.var1 avik1.hyundai;
In the below code, you are not creating a new dataset, you are expanding var1 by adding the records from hyundai.
proc append base=avik1.var1 new=avik1.hyundai force;
run;
If you ran this proc append and then ran the first data step again, you will have duplicates of all hyundai records because you are taking the EXPANDED var1 and re-adding the hyundai records.
So the point is, to answer the original question, the proc append procedure is totally unnecessary. You achieved it with just the data step.
I have a sas datebase with something like this:
id birthday Date1 Date2
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
And I want the data in this form:
id Date Datetype
1 12/4/01 birthday
1 12/4/13 1
1 12/3/14 2
2 12/3/01 birthday
2 12/6/13 1
2 12/2/14 2
3 12/9/01 birthday
3 12/4/03 1
3 12/9/14 2
4 12/8/13 birthday
4 12/3/14 1
4 12/10/16 2
Thanks by ur help, i'm on my second week using sas <3
Edit: thanks by remain me that i was not finding a sorting method.
Good day. The following should be what you are after. I did not come up with an easy way to rename the columns as they are not in beginning data.
/*Data generation for ease of testing*/
data begin;
input id birthday $ Date1 $ Date2 $;
cards;
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
; run;
/*The trick here is to use date: The colon means everything beginning with date, comparae with sql 'date%'*/
proc transpose data= begin out=trans;
by id;
var birthday date: ;
run;
/*Cleanup. Renaming the columns as you wanted.*/
data trans;
set trans;
rename _NAME_= Datetype COL1= Date;
run;
See more from Kent University site
Two steps
Pivot the data using Proc TRANSPOSE.
Change the names of the output columns and their labels with PROC DATASETS
Sample code
proc transpose
data=have
out=want
( keep=id _label_ col1)
;
by id;
var birthday date1 date2;
label birthday='birthday' date1='1' date2='2' ; * Trick to force values seen in pivot;
run;
proc datasets noprint lib=work;
modify want;
rename
_label_ = Datetype
col1 = Date
;
label
Datetype = 'Datetype'
;
run;
The column order in the TRANSPOSE output table is:
id variables
copy variables
_name_ and _label_
data based column names
The sample 'want' shows the data named columns before the _label_ / _name_ columns. The only way to change the underlying column order is to rewrite the data set. You can change how that order is perceived when viewed is by using an additional data view, or an output Proc that allows you to specify the specific order desired.
Here is my data :
data example;
input id sports_name;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
This is just a sample. The variable sports_name is categorical with 56 types.
I am trying to transpose the data to wide form where each row would have a user_id and the names of sports as the variables with values being 1/0 indicating Presence or absence.
So far, I used proc freq procedure to get the cross tabulated frequency table and put that in a different data set and then transposed that data. Now i have missing values in some cases and count of the sports in rest of the cases.
Is there any better way to do this?
Thanks!!
You need a way to create something from nothing. You could have also used the SPARSE option in PROC FREQ. SAS names cannot have length greater than 32.
data example;
input id sports_name :$16.;
retain y 1;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
;;;;
run;
proc print;
run;
proc summary data=example nway completetypes;
class id sports_name;
output out=freq(drop=_type_);
run;
proc print;
run;
proc transpose data=freq out=wide(drop=_name_);
by id;
var _freq_;
id sports_name;
run;
proc print;
run;
Same theory here, generate a list of all possible combinations using SQL instead of Proc Summary and then transposing the results.
data example;
informat sports_name $20.;
input id sports_name $;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
;
run;
proc sql;
create table complete as
select a.id, a_x.sports_name, case when not missing(e.sports_name) then 1 else 0 end as Present
from (select distinct ID from example) a
cross join (select distinct sports_name from example) a_x
full join example as e
on e.id=a.id
and e.sports_name=a_x.sports_name;
quit;
proc transpose data=complete out=want;
by id;
id sports_name;
var Present;
run;
I have a SAS dataset similar to the one created here.
data have;
input date :date. count;
cards;
20APR2012 10
20APR2012 20
20APR2012 20
27APR2012 15
27APR2012 5
;
run;
proc sort data=have;
by date;
run;
I want to create a column containing the sum for each date, so it would look like
date total
20APR2012 50
27APR2012 20
I have tried using first. but I think my syntax is off. Thanks.
This is what proc means is for.
proc means data=have;
class date;
var count;
output out=want sum=total;
run;
The code below works to give you your desired result.
proc sql;
create table wanted_tab as
select
date format date9.,
sum(count) as Total
from have
group by date;
;
quit;
I have the following sample data and 'proc means' command.
data have;
input measure country $;
datalines;
250 UK
800 Ireland
500 Finland
250 Slovakia
3888 Slovenia
34 Portugal
44 Netherlands
4666 Austria
run;
PROC PRINT data=have; RUN;
The following PROC MEANS command prints out a listing for each country above. How can I group some of those countries (i.e. UK & Ireland, Slovakia/SLovenia as Central Europe) in the PROC MEANS step, rather than adding another datastep to add a 'case when' etc?
proc means data=have sum maxdec=2 order=freq STACKODS;
var measure;
class country;
run;
Thanks for any help at all on this. I understand there are various things you can do in the PROC MEANS command itself (like limit the number of countries by doing this:
proc means data=have(WHERE=(country not in ('Finland', 'UK')
I'd like to do the grouping in the PROC MEANS command for brevity.
Thanks.
This is very easy with a format for any PROC that takes a CLASS statement.
Simply build a format, either with code or from data; then apply the format in the PROC MEANS statement.
proc format lib=work;
value $countrygroup
"UK"="British Isles"
"Ireland"="British Isles"
"Slovakia","Slovenia"="Central Europe"
;
quit;
proc means data=have;
class country;
var measure;
format country $countrygroup.;
run;
It's usually better to have numeric codes for country and then format those to be whichever set of names is needed at any one time, particularly as capitalization/etc. is pretty irritating, but this works well enough even here.
The CNTLIN= option in PROC FORMAT allows you to make a format from a dataset, with FMTNAME as the value statement, START as the value-to-label, LABEL as the label. (END=end of range if numeric.) There are other options also, the documentation goes into more detail.