I've got a wide dataset with each month listed as a column. I'd like to transpose the data to a long format, but the problem is that my column names will be changing in the future. What's the best way to transpose with a dynamic variable being passed to the transpose statement?
For example:
data have;
input subject $ "Jan-10"n $ "Feb-10"n $ "Mar-10"n $;
datalines;
1 12 18 22
2 13 19 23
;
run;
data want;
input subject month $ value;
datalines;
1 Jan-10 12
1 Feb-10 18
1 Mar-10 22
2 Jan-10 13
2 Feb-10 19
2 Mar-10 23
;
run;
Simply run the transpose procedure and provide only the by statement.
I've updated your sample data to convert the months to numeric values (rather than character which can't be transposed). I've also changed them to use valid base-sas names by removing the hyphen.
data have;
input subject $ "Jan10"n "Feb10"n "Mar10"n ;
datalines;
1 12 18 22
2 13 19 23
;
run;
Here's the transpose syntax you need, it will transpose all numeric variables by default:
proc transpose data=have out=want;
by subject;
run;
You could also do something more explicit, but still dynamic such as:
proc transpose data=have out=want;
by subject;
var jan: feb: mar: ; * ETC;
run;
This would transpose all vars that begin with jan/feb/mar etc... Useful in case your table contains other numeric variables that you don't want to include in the transpose.
Related
I have a dataset like as below, by using SAS, I need to assign the order variable based on descending count order to this dataset, when the category is missing, it should be always in the last whatever the count is. All other category above the missing one should be order by descending count.
Category Count
aa 10
bb 9
cc 8
6
ab 3
Desired output:
Category Count Order
aa 10 1
bb 9 2
cc 8 3
ab 3 4
6 5
You can use Proc DS2 to compute a sequence number for a result set.
Example:
data have;
input s $ f;
datalines;
aa 10
bb 9
cc 8
. 6
ab 3
;
proc ds2;
data want(overwrite=yes);
declare int sequence ;
method run();
set
{
select s,f from have
order by case when s is not null then f else -1e9-f end desc
};
sequence + 1;
end;
run;
quit;
Sort and split all the datasets into descending value order by missing and not-missing category, then stack them on top of each other.
/* Sort the non-missing values */
proc sort data=have out=have_notmissing;
by descending value;
where NOT missing(category);
run;
/* Sort the missing values */
proc sort data=have out=have_missing;
by descending value;
where missing(category);
run;
/* Stack them on top of each other */
data want;
set have_notmissing
have_missing
;
rank+1;
run;
Output:
category value rank
aa 10 1
bb 9 2
cc 8 3
ab 3 4
6 5
Perhaps what you really want is a NOTSORTED format and a procedure that supports the PRELOADFMT option and ORDER=DATA.
data test;
input category $ count;
cards;
aa 10
bb 9
cc 8
. 6
ab 3
;;;;
run;
proc print;
run;
proc format;
value $cat(notsorted)
'bb'='Bee Bee'
'aa'='Aha'
'cc'='CC'
'dd'='D D'
'ab'='AB'
;
quit;
proc summary data=test nway missing;
class category / order=data preloadfmt;
format category $cat.;
freq count;
output out=summary(drop=_type_) / levels;
run;
proc print;
run;
I have a SAS column as below
-10
20
-30
40
I want to make the column like
10
20
30
40
I need to remove the sign and keep the same number. I don't know how to do this.
You can use ABS function.
A small sample code:
data begin;
input var ##;
cards;
1 1 -1 -1 2 -2 -3 3
; run;
data wanted;
set begin;
var2= abs(var);
run;
For more on abs see documentation
EDIT: In case you are dealing with strings you can just remove the string:
data begin;
input var $ ##;
cards;
1 1 -1 -1 2 -2 -3 3
; run;
data wanted;
set begin;
var2= tranwrd(var, '-', '');
run;
Also documentation on TRANWRD
two ways without creating additional variables:
data begin;
input var ##;
cards;
1 1 -1 -1 2 -2 -3 3
; run;
data wanted;
set begin;
var= abs(var);
run;
proc sql noprint;
create table wanted2 as
select abs(var)as var from begin;quit;
Another way would be to be to create a new variable where var2=sqrt(var**2)
I have a summary table which I want to transpose, but I can't get my head around. The columns should be the rows, and the columns are the values.
Some explanation about the table. Each column represents a year. People can be in 3 groups: A, B or C. In 2016, everyone (100) is in group A. In 2017, 35 are in group A (5 + 20 + 10), 15 in B and 50 in C.
DATA have;
INPUT year2016 $ year2017 $ year2018 $ count;
DATALINES;
A A A 5
A A B 20
A A C 10
A B C 15
A C A 50
;
RUN;
I want to be able to make a nice graph of the evolution of the groups through the different periods. So I want to end up with a table where the columns are the rows (=period) and the columns are the values (= the 3 different groups). Please find an example of the table I want:
Image of table want
I have tried different approaches, but I can't get what I want.
Maybe more direct way but this is probably how I would do it.
DATA have;
INPUT year2016 $ year2017 $ year2018 $ count;
id + 1;
DATALINES;
A A A 5
A A B 20
A A C 10
A B C 15
A C A 50
;
RUN;
proc print;
proc transpose data=have out=want1 name=period;
by id count notsorted;
var year:;
run;
proc print;
run;
proc summary data=want1 nway completetypes;
class period col1;
freq count;
output out=want2(drop=_type_);
run;
proc print;
run;
proc transpose data=want2 out=want(drop=_name_) prefix=Group_;
by period;
var _freq_;
id col1;
run;
proc print;
run;
I have an observation and I need to make a column with SAS
I tried split, I tried transpose, but nothing...
I have:
num first second third
1 13 17 16
2 23 11 64
I need:
num var_n
1 13
17
16
2 23
11
64
Can you give me some advice, please
Proc Transpose is already the right step to get your data in shape. Proc report is only used to display IDs only once.
data wide;
input num first second third;
datalines;
1 13 17 16
2 23 11 64
;
run;
proc transpose data = wide out= long (rename=(col1 = var_n)) ;
by num;
var first second third;
run;
proc report data = long;
column num var_n;
define num/ order;
run;
This is essentially the third time you've asked the same question. You can use proc transpose or proc sql to get it done.
See your other post: How to make a column of three. SAS
try the following
proc sort data=dataset;
by num;
run;
proc transpose data=dataset out=transpose;
by num;
var first second third;
run;
thanks
I have a table with some variables, say var1 and var2 and an identifier, and for some reasons, some identifiers have 2 observations.
I would like to know if there is a simple way to put back the second observation of the same identifier into the first one, that is
instead of having two observations, each with var1 var2 variables for the same identifier value
ID var1 var2
------------------
A1 12 13
A1 43 53
having just one, but with something like var1 var2 var1_2 var2_2.
ID var1 var2 var1_2 var2_2
--------------------------------------
A1 12 13 43 53
I can probably do that with renaming all my variables, then merging the table with the renamed one and dropping duplicates, but I assume there must be a simpler version.
Actually, your suggestion of merging the values back is probably the best.
This works if you have, at most, 1 duplicate for any given ID.
data first dups;
set have;
by id;
if first.id then output first;
else output dups;
run;
proc sql noprint;
create table want as
select a.id,
a.var1,
a.var2,
b.var1 as var1_2,
b.var2 as var2_2
from first as a
left join
dups as b
on a.id=b.id;
quit;
Another method makes use of PROC TRANSPOSE and a data-step merge:
/* You can experiment by adding more data to this datalines step */
data have;
infile datalines;
input ID : $2. var1 var2;
datalines;
A1 12 13
A1 43 53
;
run;
/* This step puts the var1 values onto one line */
proc transpose data=tab out=new1 (drop=_NAME_) prefix=var1_;
by id;
var var1;
run;
/* This does the same for the var2 values */
proc transpose data=tab out=new2 (drop=_NAME_) prefix=var2_;
by id;
var var2;
run;
/* The two transposed datasets are then merged together to give one line */
data want;
merge new1 new2;
by id;
run;
As an example:
data tab;
infile datalines;
input ID : $2. var1 var2;
datalines;
A1 12 13
A1 43 53
A2 199 342
A2 1132 111
A2 91913 199191
B1 1212 43214
;
run;
Gives:
ID var1_1 var1_2 var1_3 var2_1 var2_2 var2_3
---------------------------------------------------
A1 12 43 . 13 53 .
A2 199 1132 91913 342 111 199191
B1 1212 . . 43214 . .
There's a very simple way of doing this, using the IDGROUP function within PROC SUMMARY.
data have;
input ID $ var1 $ var2 $;
datalines;
A1 12 13
A1 43 53
;
run;
proc summary data=have nway;
class id;
output out=want (drop=_:)
idgroup(out[2] (var1 var2)=);
run;