I have this kind of data
Configuration Retail Price month
1 450 Jan
1 520 Feb
1 630 Mar
5 650 Jan
5 320 Feb
5 480 Mar
9 770 Jan
9 180 Feb
9 320 Mar
I want my data to look like this
Configuration Jan Feb Mar
1 450 520 630
5 650 320 480
9 770 180 320
Generating some data to tinker with:
data begin;
length Configuration 3 Retail_Price 3 month $3;
input Configuration Retail_Price month;
datalines;
1 450 Jan
1 520 Feb
1 630 Mar
5 650 Jan
5 320 Feb
5 480 Mar
9 770 Jan
9 180 Feb
9 320 Mar
;
run;
In SAS everything must be sorted right. (Or you can use index, but that different thing) We use PROC SORT for this.
proc sort data= begin; by Configuration month; run;
A way to calculate sum is to utilize PROC MEANS.
proc means data=begin noprint; /*Noprint is for convienience*/
by Configuration month; /*These are the subsets*/
output out = From_means(drop=_TYPE_ _FREQ_) /*Drops are for ease sake*/
sum(Retail_Price)=
;
run;
At this point we have the data in narrow format. A way to transform this to wide format is PROC TRANSPOSE.
proc transpose data=From_means out=Wide_format;
by Configuration;
id month;
run;
Bear in mind that there are multiple other ways to accomplish the same. A popular way is to utilize PROC SQL for almost everything, but in my experience large datasets are better to be handled by SAS proc commands..
Related
I have data structured like this:
Meter_ID Date HourEnd Value
100 12/01/2007 1 986
100 12/01/2007 2 992
100 12/01/2007 3 1002
200 12/01/2007 1 47
200 12/01/2007 2 45
200 12/01/2007 3 50
300 12/01/2007 1 32
300 12/01/2007 2 37
300 12/01/2007 3 40
And would like to transpose the information so that I end up with this:
Date HourEnd Meter100 Meter200 Meter300
12/01/2007 1 986 47 32
12/01/2007 2 992 45 37
12/01/2007 3 1002 50 40
I have tried numerous PROC TRANSPOSE options and variations and am confusing myself. Any help would be greatly appreciated!
You need to SORT.
data have;
infile cards firstobs=2;
input Meter_ID Date:mmddyy. HourEnd Value;
format date mmddyy10.;
cards;
Meter_ID Date HourEnd Value
100 12/01/2007 1 986
100 12/01/2007 2 992
100 12/01/2007 3 1002
200 12/01/2007 1 47
200 12/01/2007 2 45
200 12/01/2007 3 50
300 12/01/2007 1 32
300 12/01/2007 2 37
300 12/01/2007 3 40
;;;;
run;
proc print;
proc sort data=have;
by date hourend meter_id;
run;
proc print;
run;
proc transpose prefix="Meter"n;
by date hourend;
id meter_id;
var value;
run;
proc print;
run;
I have a SAS dataset that I need to transpose from wide format to long format
data that I have:
DATES Year1 Year2 Year3
Jan 100 200 300
Data I want:
DATES Year Income
Jan 1 100
Jan 2 200
Jan 3 300
In this scenario the syntax for proc transpose is fairly simple.
proc transpose data=have out=want(rename=(_name_=Year col1=Income));
by date;
var year:; * the ':' is a wildcard character;
run;
The resulting output:
Obs date Year Income
1 Jan year1 100
2 Jan year2 200
3 Jan year3 300
As the title suggests, I experience some unexpected performance behaviour while working with a datastep.
A. The following code executes in 0.01 sec. So far so good.
data policen_roh;
set dwhprod.tbwh_kdu_detail_hi(
keep=
kdu_dt_id police_nr record_typ kdnr bag betrag_akt ursp_beginn_dt beginn_dt ablauf_dt storno_dt
where=(
police_nr=406045267
and record_typ='P'
)
)
;
run;
B. Additionally I have to filter a date, which is stored in a date-id, starting at 1 for 01/01/1850. Since I created formats to convert the date-id to a year (integer), I added the line input(put(kdu_dt_id, tag_id2jahr.),best.) ge 2017.
Works as expected. No problem here. I get my 15 expected records, and execution time increases marginally to 0.02 sec:
data policen_roh;
set dwhprod.tbwh_kdu_detail_hi(
keep=
kdu_dt_id police_nr record_typ kdnr bag betrag_akt ursp_beginn_dt beginn_dt ablauf_dt storno_dt
where=(
police_nr=406045267
and input(put(kdu_dt_id, tag_id2jahr.),best.) ge 2017
and record_typ='P'
)
)
;
run;
C. Now here is the problem: In an effort to speed up my code for larger datasets, I replaced
input(put(kdu_dt_id, tag_id2jahr.),best.) ge 2017
with
kdu_dt_id gt 60997 - the equivalent of 01/01/2017.
To my understanding, this should be way faster, since there is no put/input calculation required. However, while this returns the same result as B., execution time increases to roughly 30.00 seconds.
What is did I miss?
Appendix: Log for further reference
1 The SAS System 13:56 Wednesday, February 7, 2018
1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='Programm';
4 %LET _CLIENTPROJECTPATH='R:\Projekte\20180125 Erneuerungsprovisionen\Erneuerungsprovisionen.egp';
5 %LET _CLIENTPROJECTNAME='Erneuerungsprovisionen.egp';
6 %LET _SASPROGRAMFILE=;
7
8 ODS _ALL_ CLOSE;
9 OPTIONS DEV=ACTIVEX;
10 GOPTIONS XPIXELS=0 YPIXELS=0;
11 FILENAME EGHTML TEMP;
12 ODS HTML(ID=EGHTML) FILE=EGHTML
13 ENCODING='utf-8'
14 STYLE=HtmlBlue
15 STYLESHEET=(URL="file:///C:/Program%20Files%20(x86)/SASHOME/x86/SASEnterpriseGuide/7.1/Styles/HtmlBlue.css")
16 ATTRIBUTES=("CODEBASE"="http://www2.sas.com/codebase/graph/v94/sasgraph.exe#version=9,4")
17 NOGTITLE
18 NOGFOOTNOTE
19 GPATH=&sasworklocation
20 ;
NOTE: Writing HTML(EGHTML) Body file: EGHTML
21
22 GOPTIONS ACCESSIBLE;
23 data policen_roh;
24 set dwhprod.tbwh_kdu_detail_hi(
25 keep=
26 kdu_dt_id police_nr record_typ kdnr bag betrag_akt ursp_beginn_dt beginn_dt ablauf_dt storno_dt
27 where=(
28 police_nr=406045267
29 and kdu_dt_id gt 60997
30 and record_typ='P'
31 )
32 )
33 ;
34 run;
NOTE: There were 14 observations read from the data set DWHPROD.TBWH_KDU_DETAIL_HI.
WHERE (police_nr=406045267) and (kdu_dt_id>60997) and (record_typ='P');
NOTE: The data set WORK.POLICEN_ROH has 14 observations and 10 variables.
NOTE: Compressing data set WORK.POLICEN_ROH increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
real time 1:10.44
cpu time 0.03 seconds
35
36 GOPTIONS NOACCESSIBLE;
37 %LET _CLIENTTASKLABEL=;
38 %LET _CLIENTPROJECTPATH=;
39 %LET _CLIENTPROJECTNAME=;
40 %LET _SASPROGRAMFILE=;
41
42 ;*';*";*/;quit;run;
43 ODS _ALL_ CLOSE;
44
45
46 QUIT; RUN;
2 The SAS System 13:56 Wednesday, February 7, 2018
47
My guess is that your underlying table is on a database, and the removal of the input / put functions changed the query execution such that it is no longer making use of available indexes.
A bit counterintuitive, but try removing kdu_dt_id gt 60997 from your where clause and put it in a gating if statement instead.
data policen_roh;
set dwhprod.tbwh_kdu_detail_hi(
keep=
kdu_dt_id police_nr record_typ kdnr bag betrag_akt ursp_beginn_dt beginn_dt ablauf_dt storno_dt
where=(
police_nr=406045267
and record_typ='P'
)
)
;
if kdu_dt_id gt 60997;
run;
Alternatively speak to your dba about tuning your database if this is a query you will run often.
For more information, you could re-write your query using proc sql, with the _tree option (to view the execution plan). You could then use the _method option to play around / tune that plan.
Also, check out options sastrace=',,,d' sastraceloc=saslog; to show your dba more info on what is being sent to the database in terms of the underlying SQL query.
I have an example table as below
id term subj prof hour
20 2016 COM James 4
20 2016 COM Henrey 4
30 2016 HUM Nelly 3
30 2016 HUM John 3
30 2016 HUM Jimmy 3
45 2016 CGS Tim 3
I need to divide hours if the id- term and subj same. There are 2 different prof with same id:20 - term and subj, so i divided hour 2.
There are 3 different prof with same id : 30 - term and subj. So i divided hour 3.
So the output should be like this;
id term subj prof hour
20 2016 COM James 2
20 2016 COM Henrey 2
30 2016 HUM Nelly 1
30 2016 HUM John 1
30 2016 HUM Jimmy 1
45 2016 CGS Tim 3
In SAS you can use a double DOW loop to achieve this, once the data has been sorted in the correct order. The first loop counts how many profs there are with the same id, term and subj. The second loop divides hour by the number of profs. The loops are performed at each change of id, term or subj.
I've created a new_hour variable and kept in the temporary _counter variable just so you can see the code working, you can obviously overwrite the hour variable and drop the _counter variable if you wish
/* create initial dataset */
data have;
input id term subj $ prof $ hour;
datalines;
20 2016 COM James 4
20 2016 COM Henrey 4
30 2016 HUM Nelly 3
30 2016 HUM John 3
30 2016 HUM Jimmy 3
45 2016 CGS Tim 3
;
run;
/* sort data */
proc sort data=have;
by id term subj prof;
run;
/* create output dataset */
data want;
do until(last.subj); /* 1st loop*/
set have;
by id term subj prof;
if first.subj then _counter=0; /* reset counter when id, term or subj change */
_counter+first.prof; /* count number of times prof changes */
end;
do until(last.subj); /* 2nd loop */
set have;
by id term subj;
new_hour=hour / _counter; /* divide hour by number of profs from 1st loop */
output; /* output record */
end;
run;
Assuming your problem is as simple as the one you gave as an example, one proc sql should suffice. If it is more complicated, please explain how so we can be more helpful!
data have;
input id term subj $ prof $ hour;
datalines;
20 2016 COM James 4
20 2016 COM Henrey 4
30 2016 HUM Nelly 3
30 2016 HUM John 3
30 2016 HUM Jimmy 3
45 2016 CGS Tim 3
;
run;
proc sql;
create table want as select
*, hour / count(prof) as hour_adj
from have
group by id, subj;
quit;
I have a data set like this:
id type time
70657 23E Nov 4 2002 12:00AM
61651 12R
11603 DQ2
45819 Jul 23 2013 12:00AM
732 Mar 4 2011 12:00AM
22810 231
I want to do two things with missing values.
The first thing is how to remove the rows if the values of the variable time is " ".
desired output1
id type time
70657 23E Nov 4 2002 12:00AM
45819 Jul 23 2013 12:00AM
732 Mar 4 2011 12:00AM
The second thing is to remove the rows if there is any missing values.
desired output2
id type time
70657 23E Nov 4 2002 12:00AM
SAS code:
data character;
length id type time $ 24;
input id $ 1-5 type $ 8-10 time $ 13-31;
cards;
70657 23E Nov 4 2002 12:00AM
61651 12R
11603 DQ2
45819 Jul 23 2013 12:00AM
732 Mar 4 2011 12:00AM
22810 231
;
run;
I would be inclined to use proc sql. Something like:
proc sql;
create table newchar as
select *
from character
where id is not null and type is not null and time is not null;
quit;
The SAS alternative.
DATA WANT;
SET CHARACTER (WHERE = (TIME ~= "" AND TYPE ~= "" AND ID ~= ""));
RUN;