How to shift value of column as new variable name? - sas

I have a dataset that looks like this
ID Model_Value Count_Model
111 24 2
222 12 9
234 88 6
111 88 8
222 24 10
222 88 17
I want it to look like this:
ID Model_12 Model_24 Model_88
111 0 2 8
222 9 10 17
234 0 0 6
I don't think I am searching online for the correct terms, I thought initially a transform might work but I still want the row to represent the ID not the model.
How do I go about creating this output from what I have?

Ok I believe this is it! Thank you #mjsqu !!
I was able to do this with the help of this link: http://www.sascommunity.org/mwiki/images/d/dd/PROC_Transpose_slides.pdf
data test_transpose ;
input #1 ID_P #6 Model_Value #18 Count_Model ;
cards;
111 24 2
222 12 9
234 88 6
111 88 8
222 24 10
222 88 17
run;
proc print data=test_transpose;
run;
proc sort data=test_transpose out=test_transpose_S;
By ID_P;
run;
proc transpose
data = test_transpose_S
out = test_transpose_result (drop=_name_)
prefix=Model_Value;
var Count_Model;
BY ID_P;
id Model_Value;
run;
proc print data=test_transpose_result ;
run;
Output of the original sorted dataset and the transpose!

Related

How do I get the following combined table through SAS?

My goal is to combine these tables into one without having to manually run my macro each time for each column.
The code I currently have with me is the following:
%macro task_Oct(set,col_name);
data _type_;
set &set;
call symputx('col_type', vtype(&col_name));
run;
proc sql;
create table work.oct27 as
select "&col_name" as variable,
"&col_type" as type,
nmiss(&col_name) as missing_val,
count(distinct &col_name) as distinct_val
from &set;
quit;
%mend task_Oct;
%task_Oct(sashelp.cars,Origin)
The above code gives me the following output:
|Var |Type |missing_val|distinct_val|
|Origin|Character|0 | 3 |
But the sashelp.cars data sheet has 15 columns and so I would like to output a new data sheet which has 15 rows with 4 columns.
I would like to get the following combined table as the output of my code:
|Var |Type |missing_val|distinct_val|
|Make |Character|0 | 38 |
|Model |Character|0 | 425 |
|Type |Character|0 | 6 |
|Origin|Character|0 | 3 |
...
...
...
Since I'm using a macro, I could run my code 15 different times by manually entering the names of the columns and then merging the tables into 1; and it wouldn't be a problem. But what if I have a table with 100s of columns? I could use some loop statement but I'm not sure how to go about that in this case. Help would be appreciated. Thank you.
The main output you appear to want can be generated directly by PROC FREQ with the NLEVELS option. If you want to add in the variable type then just merge it with the output of PROC CONTENTS.
ods exclude all;
ods output nlevels=nlevels;
proc freq data=sashelp.cars nlevels;
tables _all_ / noprint ;
run;
ods select all;
proc contents data=sashelp.cars noprint out=contents;
run;
proc sql;
create table want(drop=table:) as
select c.varnum,c.name,c.type,n.*
from contents c inner join nlevels n
on c.name=n.TableVar
order by varnum
;
quit;
Result
NNon
NMiss Miss
Obs VARNUM NAME TYPE NLevels Levels Levels
1 1 Make 2 38 0 38
2 2 Model 2 425 0 425
3 3 Type 2 6 0 6
4 4 Origin 2 3 0 3
5 5 DriveTrain 2 3 0 3
6 6 MSRP 1 410 0 410
7 7 Invoice 1 425 0 425
8 8 EngineSize 1 43 0 43
9 9 Cylinders 1 8 1 7
10 10 Horsepower 1 110 0 110
11 11 MPG_City 1 28 0 28
12 12 MPG_Highway 1 33 0 33
13 13 Weight 1 348 0 348
14 14 Wheelbase 1 40 0 40
15 15 Length 1 67 0 67
The NMissLevels variable counts the number of distinct types of missing values.
If instead you want to count the number of observations with (any) missing value you will need to use code generation. So use the CONTENTS data to generate an SQL query to generate all of the counts you want into a single observation with only one pass through the data. You can then transpose that to make it usable for re-merging with the CONTENTS data.
filename code temp;
data _null_;
set contents end=eof;
length nliteral $65 dsname $80;
nliteral=nliteral(name);
dsname = catx('.',libname,nliteral(memname));
file code;
if _n_=1 then put 'create table counts as select ' / ' ' # ;
else put ',' #;
put 'nmiss(' nliteral ') as missing_' varnum
/',count(distinct ' nliteral ') as distinct_' varnum
;
if eof then put 'from ' dsname ';';
run;
proc sql;
%include code /source2;
quit;
proc transpose data=counts out=count2 name=name ;
run;
proc sql ;
create table want as
select c.varnum, c.name, c.type
, m.col1 as nmissing
, d.col1 as ndistinct
from contents c
left join count2 m on m.name like 'missing%' and c.varnum=input(scan(m.name,-1,'_'),32.)
left join count2 d on d.name like 'distinct%' and c.varnum=input(scan(d.name,-1,'_'),32.)
order by varnum
;
quit;
Result
Obs VARNUM NAME TYPE nmissing ndistinct
1 1 Make 2 0 38
2 2 Model 2 0 425
3 3 Type 2 0 6
4 4 Origin 2 0 3
5 5 DriveTrain 2 0 3
6 6 MSRP 1 0 410
7 7 Invoice 1 0 425
8 8 EngineSize 1 0 43
9 9 Cylinders 1 2 7
10 10 Horsepower 1 0 110
11 11 MPG_City 1 0 28
12 12 MPG_Highway 1 0 33
13 13 Weight 1 0 348
14 14 Wheelbase 1 0 40
15 15 Length 1 0 67

Creating unique order count for duplicate orders by account id and order id

I'm trying to create a data set that will show me the duplicate transactions. The trouble I'm running into is when there are multiple orders on one order_id. The records that get assigned the 2s I would be considering the duplicate order.
data have;
input acct_id order_id;
datalines;
1 121
1 122
2 123
2 124
3 125
3 125
3 125
3 126
3 126
3 126
data want;
set have;
by acct_id order_id;
if first.acct_id then order_count = 1;
else order_count =2;
run;
My desired output is below.
acct_id | order_id | order_count
1 121 1
1 122 2
2 123 1
2 124 2
3 125 1
3 125 1
3 125 1
3 126 2
3 126 2
3 126 2
What I have coded out already I feel like is close but I can't get it figured out.
data want;
set have;
by acct_id order_id notsorted;
if first.acct_id then order_count=0;
if first.order_id then order_count+1;
put acct_id order_id order_count;
run;
acct_id order_id order_count
1 121 1
1 122 2
2 123 1
2 124 2
3 125 1
3 125 1
3 125 1
3 126 2
3 126 2
3 126 2

How to create a running 3 observation average in SAS?

I have a dataset with some volumes in a column and I want to create a second column that contains the average of the previous three observations. Is this possible?
e.g.
data have;
input Vol Avg_pre_4;
datalines;
228 .
141 .
125 .
101 164.66
116 122.33
107 114
74 108
118 99
127 99.67
123 106.33
;
run;
The LAG function is an automatic built-in queue.
VOL_AVG_OF_PRIOR3 = MEAN ( lag(Vol), lag2(Vol), lag3(Vol) )
if _n_ < 4 then VOL_AVG_OF_PRIOR3 = .;

rolling up groups in a matrix

Here is the data I have, I use proc tabulate to present it how it is presented in excel, and to make the visualization easier. The goal is to make sure groups strictly below the diagonal (i know it's a rectangle, the (1,1) (2,2)...(7,7) "diagonal") to roll up the column until it hits the diagonal or makes a group size of at least 75.
1 2 3 4 5 6 7 (month variable)
(age)
1 80 90 100 110 122 141 88
2 80 90 100 110 56 14 88
3 80 90 87 45 12 41 88
4 24 90 100 110 22 141 88
5 0 1 0 0 0 0 2
6 0 1 0 0 0 0 6
7 0 1 0 0 0 0 2
8 0 1 0 0 0 0 11
Ive already used if/thens to regroup certain data values, but I need a general way to do it for other sets.
Thanks in advance
desired results
1 2 3 4 5 6 7 (month variable)
(age)
1 80 90 100 110 122 141 88
2 80 90 100 110 56 14 88
3 104 90 87 45 12 41 88
4 0 94 100 110 22 141 88
5 0 0 0 0 0 0 2
6 0 0 0 0 0 0 6
7 0 0 0 0 0 0 13
8 0 0 0 0 0 0 0
Mock up some categorical data for some patients who have to be counted
data mock;
do patient_id = 1 to 2500;
month = ceil(7*ranuni(123));
age = ceil(8*ranuni(123));
output;
end;
stop;
run;
Create a tabulation of counts (N) similar to the one shown in the question:
options missing='0';
proc tabulate data=mock;
class month age;
table age,month*n=''/nocellmerge;
run;
For each month get the sub-diagonal patient count
proc sql;
/* create table subdiagonal_column as */
select month, count(*) as subdiag_col_freq
from mock
where age > month
group by month;
For each row get the pre-diagonal patient count
/* create table prediagonal_row as */
select age, count(*) as prediag_row_freq
from mock
where age > month
group by age;
other sets can be tricky if the categorical values are not +1 monotonic. To do a similar process for non-montonic categorical values you will need to create surrogate variables that are +1 monotonic. For example:
data mock;
do item_id = 1 to 2500;
pet = scan ('cat dog snake rabbit hamster', ceil(5*ranuni(123)));
place = scan ('farm home condo apt tower wild', ceil(6*ranuni(123)));
output;
end;
run;
proc tabulate data=mock;
class pet place;
table pet,place*n=''/nocellmerge;
run;
proc sql;
create table unq_pets as select distinct pet from mock;
create table unq_places as select distinct place from mock;
data pets;
set unq_pets;
pet_num = _n_;
run;
data places;
set unq_places;
place_num = _n_;
run;
proc sql;
select distinct place_num, mock.place, count(*) as subdiag_col_freq
from mock
join pets on pets.pet = mock.pet
join places on places.place = mock.place
where pet_num > place_num
group by place_num
order by place_num
;

Create two datasets in SAS output

I have the following dataset:
DATA survey;
INPUT id sex $ age inc r1 r2 r3 ;
DATALINES;
1 F 35 17 7 2 2
17 M 50 14 5 5 3
33 F 45 6 7 2 7
49 M 24 14 7 5 7
65 F 52 9 4 7 7
81 M 44 11 7 7 7
2 F 34 17 6 5 3
18 M 40 14 7 5 2
34 F 47 6 6 5 6
50 M 35 17 5 7 5
;
Now I would like to create to files based on whether the records are Female (F)or NOT. Therefore I do this:
date female other;
set survey;
if sex = "F" then output USA;
else output other;
run;
PROC PRINT; RUN;
This however does not give me two sets with data depending on the F and M value. Any idea on what I am doing wrong here?
When you look in the log window, do you see any error messages?
If your code is
if sex = "F" then output USA;
you should see an error, because the DATA statement does not include a dataset named USA. If you change USA to FEMALE it should work.
Learning to read log messages is an essential skill in SAS.