I'm trying to transpose this table:
YearMonth ID Purchase Purchase Value
201912 1 Laptop 1000
202012 1 Computer 2000
202112 1 Phone 1000
201912 2 Stereo 500
202012 2 Headset 200
To look like this using PROC Transpose:
ID Purchase_201912 Purchase_202012 Purchase_202112 PV_201912 PV_202012 PV_202112
1 Laptop Computer Phone 1000 2000 1000
2 Stereo Headset - 500 200 -
I think I'll have to transpose multiple times to achieve this. The first transpose I've tried doing is this:
proc transpose data=query_final out=transpose_1 let;
by yearmonth agent_number;
run;
but I keep getting the error
ERROR: Data set WORK.QUERY_FINAL is not sorted in ascending sequence. The current BY group has YearMonth = 202112
and the next BY group has YearMonth = 201912.
I've checked the the data from the table I'm pulling from is indeed sorted in ascending order by YearMonth then grouped by agent number, so I'm not sure what this error is referring to. Could it be that not all IDs have the same YearMonths associated with them (i.e. in example above, ID 2 did not purchase anything in 2021).
Assuming agent_number is equivalent to id in your example, I reproduced the data:
data have;
infile datalines4 delimiter="|";
input yearmonth id purchase :$8. PV;
datalines4;
201912|1|Laptop|1000
202012|1|Computer|2000
202112|1|Phone|1000
201912|2|Stereo|500
202012|2|Headset|200
;;;;
quit;
You can use two proc transpose and then merge the latter
proc transpose data=have out=stage1(drop=_name_) prefix=Purchase_;
by id;
var purchase;
id yearmonth;
run;
proc transpose data=have out=stage2(drop=_name_) prefix=PV_;
by id;
var PV;
id yearmonth;
run;
data want;
merge stage1 stage2;
run;
Resulting in the desired output
id purchase_201912 purchase_202012 purchase_202112 PV_201912 PV_202012 PV_202112
1 Laptop Computer Phone 1000 2000 1000
2 Stereo Headset 500 200 .
PS: In order to avoid getting the error you report, sort the data first, in the same manner as in the by statement in the proc transpose. However, it is not needed here as it is already sorted by id.
Related
I have a sas datebase with something like this:
id birthday Date1 Date2
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
And I want the data in this form:
id Date Datetype
1 12/4/01 birthday
1 12/4/13 1
1 12/3/14 2
2 12/3/01 birthday
2 12/6/13 1
2 12/2/14 2
3 12/9/01 birthday
3 12/4/03 1
3 12/9/14 2
4 12/8/13 birthday
4 12/3/14 1
4 12/10/16 2
Thanks by ur help, i'm on my second week using sas <3
Edit: thanks by remain me that i was not finding a sorting method.
Good day. The following should be what you are after. I did not come up with an easy way to rename the columns as they are not in beginning data.
/*Data generation for ease of testing*/
data begin;
input id birthday $ Date1 $ Date2 $;
cards;
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
; run;
/*The trick here is to use date: The colon means everything beginning with date, comparae with sql 'date%'*/
proc transpose data= begin out=trans;
by id;
var birthday date: ;
run;
/*Cleanup. Renaming the columns as you wanted.*/
data trans;
set trans;
rename _NAME_= Datetype COL1= Date;
run;
See more from Kent University site
Two steps
Pivot the data using Proc TRANSPOSE.
Change the names of the output columns and their labels with PROC DATASETS
Sample code
proc transpose
data=have
out=want
( keep=id _label_ col1)
;
by id;
var birthday date1 date2;
label birthday='birthday' date1='1' date2='2' ; * Trick to force values seen in pivot;
run;
proc datasets noprint lib=work;
modify want;
rename
_label_ = Datetype
col1 = Date
;
label
Datetype = 'Datetype'
;
run;
The column order in the TRANSPOSE output table is:
id variables
copy variables
_name_ and _label_
data based column names
The sample 'want' shows the data named columns before the _label_ / _name_ columns. The only way to change the underlying column order is to rewrite the data set. You can change how that order is perceived when viewed is by using an additional data view, or an output Proc that allows you to specify the specific order desired.
I am new to sas and are trying to handle some customer data, and I'm not really sure how to do this.
What I have:
data transactions;
input ID $ Week Segment $ Average Freq;
datalines;
1 1 Sports 500 2
1 1 PC 400 3
1 2 Sports 350 3
1 2 PC 550 3
2 1 Sports 650 2
2 1 PC 700 3
2 2 Sports 720 3
2 2 PC 250 3
;
run;
What I want:
data transactions2;
input ID Week1_Sports_Average Week1_PC_Average Week1_Sports_Freq
Week1_PC_Freq
Week2_Sports_Average Week2_PC_Average Week2_Sports_Freq Week2_PC_Freq;
datalines;
1 500 400 2 3 350 550 3 3
2 650 700 2 3 720 250 3 3
;
run;
The only thing I got so far is this:
Data transactions3;
SET transactions;
if week=1 and Segment="Sports" then DO;
Week1_Sports_Freq=Freq;
Week1_Sports_Average=Average;
END;
else DO;
Week1_Sports_Freq=0;
Week1_Sports_Average=0;
END;
run;
This will be way too much work as I have a lot of weeks and more variables than just freq/avg.
Really hoping for some tips are, as I'm stucked.
You can use PROC TRANSPOSE to create that structure. But you need to use it twice since your original dataset is not fully normalized.
The first PROC TRANSPOSE will get the AVERAGE and FREQ readings onto separate rows.
proc transpose data=transactions out=tall ;
by id week segment notsorted;
var average freq ;
run;
If you don't mind having the variables named slightly differently than in your proposed solution you can just use another proc transpose to create one observation per ID.
proc transpose data=tall out=want delim=_;
by id;
id segment _name_ week ;
var col1 ;
run;
If you want the exact names you had before you could add data step to first create a variable you could use in the ID statement of the PROC transpose.
data tall ;
set tall ;
length new_name $32 ;
new_name = catx('_',cats('WEEK',week),segment,_name_);
run;
proc transpose data=tall out=want ;
by id;
id new_name;
var col1 ;
run;
Note that it is easier in SAS when you have a numbered series of variable if the number appears at the end of the name. Then you can use a variable list. So instead of WEEK1_AVERAGE, WEEK2_AVERAGE, ... you would use WEEK_AVERAGE_1, WEEK_AVERAGE_2, ... So that you could use a variable list like WEEK_AVERAGE_1 - WEEK_AVERAGE_5 in your SAS code.
Here is my data :
data example;
input id sports_name;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
This is just a sample. The variable sports_name is categorical with 56 types.
I am trying to transpose the data to wide form where each row would have a user_id and the names of sports as the variables with values being 1/0 indicating Presence or absence.
So far, I used proc freq procedure to get the cross tabulated frequency table and put that in a different data set and then transposed that data. Now i have missing values in some cases and count of the sports in rest of the cases.
Is there any better way to do this?
Thanks!!
You need a way to create something from nothing. You could have also used the SPARSE option in PROC FREQ. SAS names cannot have length greater than 32.
data example;
input id sports_name :$16.;
retain y 1;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
;;;;
run;
proc print;
run;
proc summary data=example nway completetypes;
class id sports_name;
output out=freq(drop=_type_);
run;
proc print;
run;
proc transpose data=freq out=wide(drop=_name_);
by id;
var _freq_;
id sports_name;
run;
proc print;
run;
Same theory here, generate a list of all possible combinations using SQL instead of Proc Summary and then transposing the results.
data example;
informat sports_name $20.;
input id sports_name $;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
;
run;
proc sql;
create table complete as
select a.id, a_x.sports_name, case when not missing(e.sports_name) then 1 else 0 end as Present
from (select distinct ID from example) a
cross join (select distinct sports_name from example) a_x
full join example as e
on e.id=a.id
and e.sports_name=a_x.sports_name;
quit;
proc transpose data=complete out=want;
by id;
id sports_name;
var Present;
run;
I hava a table in sas with columns names 'Parameter' and 'Parameter_value'.
Parameter Parameter_value
1 Member-ID 10
2 User ID 90
3 Accounting Year 2000
4 Quarter 0
5 As of Quarter End Date 12-12-2000
6 Status (File Type) s
I want to convert the data of column 'Parameter' into columns Names and the data of 'Parameter_value' into the row. Like:
Member-ID User ID Accounting Year Quarter As of Quarter End Date Status (File Type)
10 90 2000 0 12-12-2000 s
how do I achieve this? I wrote a proc for this but it does't show the required output. proc is:
PROC SQL;
CREATE VIEW WORK.SORTTempTableSorted AS
SELECT T.Parameter, T.Parameter_Value
FROM CSV.S_SUMMARY as T
;
QUIT;
PROC TRANSPOSE DATA=WORK.SORTTempTableSorted
OUT=WORK.TRNSTransposedS_SUMMARY(LABEL="Transposed CSV.S_SUMMARY")
PREFIX=Column
NAME=Source
LABEL=Label
;
VAR Parameter Parameter_Value;
/* -------------------------------------------------------------------
End of task code.
------------------------------------------------------------------- */
RUN; QUIT;
Thanks in advance
You must use the ID option of the PROC TRANSPOSE:
PROC TRANSPOSE DATA=WORK.SORTTempTableSorted
OUT=WORK.TRNSTransposedS_SUMMARY(LABEL="Transposed CSV.S_SUMMARY");
ID Parameter
VAR Parameter_Value;
RUN;
Each ID has several instances, and each instance has a different value. I would like the final output to be the maximum value per ID. So the initial dataset is:
ID Value
1 100
1 7
1 65
2 12
2 97
3 82
3 54
And the output will be:
ID Value
1 100
2 97
3 82
I tried running proc sort twice thinking that the first sort would get things in the proper order so that nodupkey on the second sort would get rid of the right values. This did not work.
proc sort work.data; by id value descending; run;
proc sort work.data nodupkey; by id; run;
Thanks!
Your approach should have worked fine but it looks like you have a syntax error - did you forget to check your log? The descending keyword needs to go before the variable you want to sort in descending order.
proc sort data=sashelp.class out=tmp;
by sex descending height;
run;
proc sort data=tmp out=final nodupkey;
by sex;
run;
Also - in case you're not familiar with SQL, I strongly suggest that you should learn it as it will simplify many data manipulation tasks. This can also be solved in a single SQL step:
proc sql noprint;
create table want as
select sex,
max(height) as height
from sashelp.class
group by sex
;
quit;
My preferred solution:
proc means data=have noprint;
class id;
var value;
output out=want max(value)=;
run;
Should be a lot faster than two sorts.