Good day, I have technical issue with proc transpose. For instance my data has structure like:
data begin;
input MAKName $ MLOName $ tcode $ Count Percent;
cards;
ABARTH 124 Miss 5 5.1
ABARTH 124 Hit 94 94.9
FIAT 124 Miss 30 12
FIAT 124 Hit 220 88
;run;
I'd like it to be transposed such that the lines adhere to the format:
MAKName MLOName Count_miss percent_miss Count_hit Percent_hit
ABARTH 124 5 5.0 94 94.9
FIAT 124 30 12 220 88
So I'd like to have two lines compressed to single one. Any permutation of variables or variable names is acceptable.
I've managed to get the hits to transpose, but the second variable gives me issues. Also the naming is a problem, but simple rename could work. in my case.
proc transpose data= Begin out= _test prefix=a_ ;
by makname mloname;
var count ;
idlabel tcode;
run; quit;
Any experienced data manipulator have time to help with this?
Edit below:
collage of mine came up with way to do this with 2x transpose:
proc transpose data=begin out=out1;
by MakName MLOName tcode;
var Count Percent;
run;
proc transpose data=out1 out=out2(drop=_NAME_) delimiter=_;
by MakName MLOName;
var Col1;
id _NAME_ tcode;
run;
Neat imho.
There could indeed be a way to do it with one proc transpose but I'm not seeing it.
Alternatevely, you can rather easily do it with two proc transpose and a data step merge:
data begin;
input MAKName $ MLOName $ tcode $ Count Percent;
cards;
ABARTH 124 Miss 5 5.1
ABARTH 124 Hit 94 94.9
FIAT 124 Miss 30 12
FIAT 124 Hit 220 88
;
run;
proc transpose data=Begin out= count_test(drop=_name_) prefix=Count_;
by makname mloname;
var count;
id tcode;
run;
proc transpose data=Begin out= percent_test(drop=_name_) prefix=Percent_;
by makname mloname;
var percent;
id tcode;
run;
data want;
merge count_test percent_test;
by makname mloname;
run;
Note that I replaced your idlabel statement with an id statement in order to create the names for the columns as you want them.
EDIT: the same idea reduced to one proc transpose but still requires merging:
data begin;
input MAKName $ MLOName $ tcode $ Count Percent;
cards;
ABARTH 124 Miss 5 5.1
ABARTH 124 Hit 94 94.9
FIAT 124 Miss 30 12
FIAT 124 Hit 220 88
;
run;
proc transpose data=Begin out=test;
by makname mloname;
var count percent;
id tcode;
run;
data want (drop=_name_);
merge test(where=(_name_='Count') rename=(Miss=Count_miss Hit=Count_hit))
test(where=(_name_='Percent') rename=(Miss=Percent_miss Hit=Percent_hit));
by makname mloname;
run;
I call this form of data shaping multi-pivot. As you learned one traditional approach is the transpose + transpose. Other techniques include:
transpose + merge (shown by user2877959)
arrays (for static data configurations)
hashes (for dynamic data configurations)
SQL codegen
Regardless of the technique a reshaping of data is often indicative of a reporting requirement. Consider using:
Proc TABULATE
Proc REPORT
Here is a tabulation example:
ods listing;
options formchar="|----|+|---+=|-/\<>*";
data have;
input MAKName $ MLOName $ tcode $ Count Percent;
cards;
ABARTH 124 Miss 5 5.1
ABARTH 124 Hit 94 94.9
FIAT 124 Miss 30 12
FIAT 124 Hit 220 88
;run;
proc tabulate data=have;
class MAKName MLOName tcode;
var Count Percent;
table
MAKName * MLOName
,
tcode='' * (Count*max=''*f=8. Percent*max='') / nocellmerge;
run;
ODS Listing output (HTML is way better, but can't be inserted in SO)
--------------------------------------------------------------------
| | Hit | Miss |
| |---------------------+---------------------|
| | Count | Percent | Count | Percent |
|----------------------+--------+------------+--------+------------|
|MAKName |MLOName | | | | |
|----------+-----------| | | | |
|ABARTH |124 | 94| 94.90| 5| 5.10|
|----------+-----------+--------+------------+--------+------------|
|FIAT |124 | 220| 88.00| 30| 12.00|
--------------------------------------------------------------------
Related
I have a variable for counting days. I'm trying to use the day count to divide by total days.
How do I create a macro that stores the most recent day and allows me to quote it later?
This is what I have so far (I've cut out code that's not relevant)
DATA scotland;
input day deathsscotland casesscotland;
cards;
1 1 85
2 1 121
3 1 153
4 1 171
5 2 195
6 3 227
7 6 266
8 6 322
9 7 373
10 10 416
11 14 499
12 16 584
13 22 719
14 25 894
;
run;
proc sort data=scotland out=scotlandsort;
by day;
run;
Data _null_;
keep day;
set scotlandsort end=eof;
if eof then output;
run;
%let daycountscot = day
Data ratio;
set cdratio;
SCOTLANDAVERAGE = (SCOTLANDRATIO/&daycountscot)*1000;
run;
Using your own code, you can create the macro variable like this
Data _null_;
keep day;
set scotlandsort end=eof;
if eof then call symputx('daycountscot', day);
run;
%put &daycountscot.;
The data _null_ is not doing anything. You can eliminate the sort and data steps by selecting the max day value directly into a macro variable.
proc sql noprint;
select max(day) into :daycountscot trimmed
from scotland
;
quit;
No need to use macro code for this, it is better to keep values in variables anyway. To convert the value into text to store it as a macro variable SAS will have to round the number.
You could make a dataset with the maximum DAY value and then combine it with the dataset where you want to do the division.
data last_day;
set scotlandsort end=eof;
if eof then output;
keep day;
rename day=last_day;
run;
data ratio;
set cdratio;
if _n_=1 then set last_day;
SCOTLANDAVERAGE = (SCOTLANDRATIO/last_day)*1000;
run;
Probably easier in SQL code:
proc sql;
create table ratio as
select a.*, (SCOTLANDRATIO/last_day)*1000 as SCOTLANDAVERAGE
from cdratio a
, (select max(day) as last_day from scotland)
;
quit;
Hi I have two tables with different column orders, and the column name are not capitalized as the same. How can I compare if the contents of these two tables are the same?
For example, I have two tables of students' grades
table A:
Math English History
-------+--------+---------
Tim 98 95 90
Helen 100 92 85
table B:
history MATH english
--------+--------+---------
Tim 90 98 95
Helen 85 100 92
You may use either of the two approaches to compare, regardless of the order or column name
/*1. Proc compare*/
proc sort data=A; by name; run;
proc sort data=B; by name; run;
proc compare base=A compare=B;
id name;
run;
/*2. Proc SQL*/
proc sql;
select Math, English, History from A
<union/ intersect/ Except>
select MATH, english, history from B;
quit;
use except corr(corresponding) it will check by name. if everything is matching you will get zero records.
data have1;
input Math English History;
datalines;
1 2 3
;
run;
data have2;
input English math History;
datalines;
2 1 3
;
run;
proc sql ;
select * from have1
except corr
select * from have2;
edit1
if you want to check which particular column it differs you may have to transpose and compare as shown below example.
data have1;
input name $ Math English pyschology History;
datalines;
Tim 98 95 76 90
Helen 100 92 55 85
;
run;
data have2;
input name $ English Math pyschology History;
datalines;
Tim 95 98 76 90
Helen 92 100 99 85
;
run;
proc sort data = have1 out =hav1;
by name;
run;
proc sort data = have2 out =hav2;
by name;
run;
proc transpose data =hav1 out=newhave1 (rename = (_name_= subject
col1=marks));
by name;
run;
proc transpose data =hav2 out=newhave2 (rename = (_name_= subject
col1=marks));
by name;
run;
proc sql;
create table want(drop=mark_dif) as
select
a.name as name
,a.subject as subject
,a.marks as have1_marks
,b.marks as have2_marks
,a.marks -b.marks as mark_dif
from newhave1 a inner join newhave2 b
on upcase(a.name) = upcase(b.name)
and upcase(a.subject) =upcase(b.subject)
where calculated mark_dif ne 0;
I have an example of a dataset in the following manner
data have;
input match percent;
cards;
0 34
0 54
0 33
0 23
1 60
1 70
1 70
1 70
;
Essentially I want to sum the observations that are associated with 0 and then divide them by the number of 0s to find the average.
e.g 34+54+33+23/4 then do the same for 1's
I looked at PROC TABULATE. However, I don't understand how to carry out this procedure.
Many ways to do this in SAS. I would use PROC SQL
proc sql noprint;
create table want as
select match,
mean(percent) as percent
from have
group by match;
quit;
You can use proc means and you will the mean plus a bunch of other stats:
more examples here for proc means.
proc means data=have noprint;
by match;
output out=want ;
Output:
This can be done very easily using proc summary or proc means.
proc summary data=have nway missing;
class match;
var percent;
output out=want mean=;
run;
You can also output a variety of other statistics using these procedures.
I have a SAS Data set called coaches_assistants with the following structure. There are always only two records per TeamID.
TeamID Team_City CoachCode
123 Durham Head_242
123 Durham Assistant_876
124 London Head_876
124 London Assistant_922
125 Bath Head_667
125 Bath Assistant_786
126 Dover Head_544
126 Dover Assistant_978
... ... ....
What I'd like to do with this is to create a data set with an extra field called AssistantCode and make it look like:
TeamID Team_City HeadCode AssistantCode
123 Durham 242 876
124 London 876 922
125 Bath 667 786
126 Dover 544 978
... ... ... ...
If possible, I'd like to do this in a single DATA step (though I recognize that I might need a PROC SORT step first). I know how to do it in python or ruby or any traditional scripting languages, but I don't know how to do it in SAS.
What's the best way to do this?
While it's possible to do in one datastep, I generally find that this sort of problem is better served in PROC TRANSPOSE. Less manual coding this way and more flexibility for new things (say a new value "HeadAssistant" appeared, this would instantly work).
data have;
length coachcode $25;
input TeamID Team_City $ CoachCode $;
datalines;
123 Durham Head_242
123 Durham Assistant_876
124 London Head_876
124 London Assistant_922
125 Bath Head_667
125 Bath Assistant_786
126 Dover Head_544
126 Dover Assistant_978
;;;;
run;
data have_t;
set have;
id=scan(coachcode,1,'_');
val = scan(coachcode,2,'_');
keep teamId team_city id val;
run;
proc transpose data=have_t out=want(drop=_name_);
by teamID team_city;
id id;
var val;
run;
Here are two possible solutions (one using a data step as requested and another using PROC SQL):
data have;
length TeamID $3 Team_City CoachCode $20;
input TeamID $ Team_City $ CoachCode $;
datalines;
123 Durham Head_242
123 Durham Assistant_876
124 London Head_876
124 London Assistant_922
125 Bath Head_667
125 Bath Assistant_786
126 Dover Head_544
126 Dover Assistant_978
run;
/* A data step solution */
proc sort data=have;
by TeamID;
run;
data want1(keep=TeamID Team_City HeadCode AssistantCode);
/* Define all variables, retain the new ones */
length TeamID $3 Team_City $20 HeadCode $3 AssistantCode $3;
retain HeadCode AssistantCode;
set have;
by TeamID;
if CoachCode =: 'Head'
then HeadCode = substr(CoachCode,6,3);
else AssistantCode = substr(CoachCode,11,3);
if last.TeamID;
run;
/* An SQL solution */
proc sql noprint;
create table want2 as
select TeamID
, max(Team_City) as Team_City
, max(CASE WHEN CoachCode LIKE 'Head%'
THEN substr(CoachCode,6,3) ELSE ' '
END) LENGTH=3 as HeadCode
, max(CASE WHEN CoachCode LIKE 'Assistant%'
THEN substr(CoachCode,11,3) ELSE ' '
END) LENGTH=3 as AssistantCode
from have
group by TeamID;
quit;
PROC SQL has the advantage of not requiring you to sort the data in advance.
This assumes you've sorted the data by teamID, and head coaches always come before assistants. Caveat: untested (I really need to get access to SAS again....)
data want (drop=nc coachcode);
set have;
length headcode assistantcode $3;
retain headcode;
by teamid;
nc = length(coachcode);
if substr(coachcode, 1, 4) = 'Head' then
headcode = substr(coachcode, nc-2, nc);
else
assistantcode = substr(coachcode, nc-2, nc);
if last.teamid;
run;
My boss would like me to create a chart and table in SAS similar to something you can produce in excel, where the data table sits below the chart. This would mean using the data on the x-axis and placing more data below it.
Desired output
(chart area) (Row 1) Building 1 Building 2 Building 3 Building 4
(Row 2) 333 267 234 235
(Row 3) 3232 213 3215 657
I'm not sure how to do this in proc report, where the data runs long, instead of wide. Also, the data set is long:
Building ID var1 var2
Building 1 333 3232
Building 2 267 213
CarolinaJay's suggestion of a PROC GCHART or SGPLOT or whatnot followed by another proc is the way to go, IMO; while you could do both at once, it's a lot more work to do so.
To accomplish your specific table, I recommend PROC TABULATE; it doesn't care what direction your data goes.
data have;
informat buildingID $12.;
input BuildingID $ var1 var2;
datalines;
Building1 333 3232
Building2 267 213
;;;;
run;
proc tabulate data=have;
class buildingID;
var var1 var2;
tables (var1 var2)*sum=' ', buildingID=' ';
run;
Plop that under a plot, and you have something like this (I have no idea how to plot this so I just picked something totally at random):
ods _all_ close;
ods html;
data have;
informat buildingID $12.;
input BuildingID $ var1 var2;
datalines;
Building1 333 323
Building2 267 213
;;;;
run;
proc sgplot data=have;
vbar var1/response=var2 group=buildingID;
run;
title;
proc tabulate data=have;
class buildingID;
var var1 var2;
tables (var1 var2)*sum=' ', buildingID=' ';
run;
ods html close;