I am trying to rearrange my data but am having a difficult time. The data I have looks something like this:
date a b c
====================
1996 5 7 8
1997 4 2 3
1998 1 9 6
what I want is to rearrange the data (presumably using arrays) to get this:
date val var
=============
1996 5 a
1997 4 a
1998 1 a
1996 7 b
1997 2 b
1998 9 b
1996 8 c
1997 3 c
1997 6 c
So that I've essentially stacked the variables (a,b,c) along with the corresponding date and name of the variable.
Thanks in advance!
Use PROC TRANSPOSE to pivot the data.
First sort by DATE
proc sort data=have;
by date;
run;
Then use transpose
proc transpose data=have out=want(rename=(COL1=VAL _NAME_=VAR));
by date;
var a b c;
run;
Finally, it looks like you want this sorted by VAR, and then DATE
proc sort data=want;
by VAR date;
run;
Since you mention arrays, here's how you would achieve the result using them.
I would, however, use the proc transpose method in the answer from #DomPazz as procedures are generally easier to read and understand by others who may need to look at the code
/* create initial dataset */
data have;
input date a b c;
datalines;
1996 5 7 8
1997 4 2 3
1998 1 9 6
;
run;
/* transpose data */
data want;
set have;
array vars{*} a b c; /* create array of required values */
length val 8 var $8; /* set lengths of new variables */
do i = 1 to dim(vars); /* loop through each element of the array */
val = vars{i}; /* set val to be current array value */
var = vname(vars{i}); /* set var to be name of current array variable name */
drop a b c i; /* drop variables not required */
output; /* output each value to a new row */
end;
run;
/* sort data in required order */
proc sort data=want;
by var date;
run;
Related
I have a dataset like as below, by using SAS, I need to assign the order variable based on descending count order to this dataset, when the category is missing, it should be always in the last whatever the count is. All other category above the missing one should be order by descending count.
Category Count
aa 10
bb 9
cc 8
6
ab 3
Desired output:
Category Count Order
aa 10 1
bb 9 2
cc 8 3
ab 3 4
6 5
You can use Proc DS2 to compute a sequence number for a result set.
Example:
data have;
input s $ f;
datalines;
aa 10
bb 9
cc 8
. 6
ab 3
;
proc ds2;
data want(overwrite=yes);
declare int sequence ;
method run();
set
{
select s,f from have
order by case when s is not null then f else -1e9-f end desc
};
sequence + 1;
end;
run;
quit;
Sort and split all the datasets into descending value order by missing and not-missing category, then stack them on top of each other.
/* Sort the non-missing values */
proc sort data=have out=have_notmissing;
by descending value;
where NOT missing(category);
run;
/* Sort the missing values */
proc sort data=have out=have_missing;
by descending value;
where missing(category);
run;
/* Stack them on top of each other */
data want;
set have_notmissing
have_missing
;
rank+1;
run;
Output:
category value rank
aa 10 1
bb 9 2
cc 8 3
ab 3 4
6 5
Perhaps what you really want is a NOTSORTED format and a procedure that supports the PRELOADFMT option and ORDER=DATA.
data test;
input category $ count;
cards;
aa 10
bb 9
cc 8
. 6
ab 3
;;;;
run;
proc print;
run;
proc format;
value $cat(notsorted)
'bb'='Bee Bee'
'aa'='Aha'
'cc'='CC'
'dd'='D D'
'ab'='AB'
;
quit;
proc summary data=test nway missing;
class category / order=data preloadfmt;
format category $cat.;
freq count;
output out=summary(drop=_type_) / levels;
run;
proc print;
run;
I just started using SAS and I'm trying to combine columns.
I've got table mainData
A1 A2 A3 A4
1 4 7 10
2 5 8 11
3 6 9 12
I want to create a new table rearrangedData
Type Value
A1 1
A1 2
A1 3
A2 4
A2 5
A2 6
A3 7
A3 8
A3 9
A4 10
A4 11
A4 12
There must be a simple solution to this I just can't figure this out. I'm thinking of writing do loop, but what if I don't know size of a table or amount of lines in a specific column. I can't figure how I would get such information in SAS.
This somewhat unusual transformation can be done via a transpose and some array logic:
data have;
input A1 A2 A3 A4;
cards;
1 4 7 10
2 5 8 11
3 6 9 12
;
run;
proc transpose data = have out = tr name=type prefix = r;
run;
data want;
set tr;
array r{*} r:;
do i = 1 to dim(r);
value = r[i];
output;
end;
drop i r:;
run;
Also, this preserves the original order without requiring a sort.
Make a dummy variable, then transpose data.
data have;
set have;
id=_n_;
run;
proc transpose data=have out=temp;
by id;
var A1-A4;
run;
proc sort data=temp out=want(rename=(_name_=type col1=value) drop=id);
by _name_;
run;
If you want to preserve the original order then you could use the POINT= option on the SET statement to loop over the data set once per variable (column).
So this data set will read the first observations just to get the variables defined. Then define the array VALUES so that we can use DIM(VALUES) to know how many columns. Then it uses the POINT= and NOBS= options on the SET statement to control the other loop. It uses the VNAME() function to find the name of the current variable in the array.
data want ;
set have ;
array values _numeric_;
do col=1 to dim(values);
length type $32 value 8;
type=vname(values(col));
do row=1 to nobs ;
set have point=row nobs=nobs ;
value=values(col);
output;
keep type value;
end;
end;
stop;
run;
I have a dataset that has columns like:
a|b|c|d|e
and rows like:
1|3|5|7|9
2|4|6|8|10
How to change it to:
Char|Num|
a|1
a|2
b|3
b|4
c|5
c|6
d|7
d|8
e|9
e|10
Thank you in advance!
You can use PROC TRANSPOSE. The only gotcha is to get what you want you need a BY variable. Easiest thing is to add a record number and use that as your BY.
data have;
input a b c d;
i = _n_;
datalines;
1 2 3 4
5 6 7 8
;
run;
proc transpose data=have out=want(drop=i);
by i;
var a b c d;
run;
Suppose a data are as follows:
A B C
1 3 2
1 4 9
2 6 0
2 7 3
where A B and C are the variable names.
Is there a way to transform the table to
A 1
A 1
A 2
A 2
B 3
B 4
B 6
B 7
C 2
C 9
C 0
C 3
Expanding on the advice from #donPablo, here's how you would code it. Create an array to read across the data, then output each iteration of that array so you end up with the number of rows being the rows * columns from the original dataset. The VNAME function enables you to store the variable name (A, B, C) as a value in a separate variable.
data have;
input A B C;
datalines;
1 3 2
1 4 9
2 6 0
2 7 3
;
run;
data want;
set have;
length var1 $10;
array vars{*} _numeric_;
do i=1 to dim(vars);
var1=vname(vars{i});
var2=vars{i};
keep var1 var2;
output;
end;
run;
proc sort data=want;
by var1;
run;
The least amount of (expensive) development time might be --
Read and store the first row
For each subsequent row
Read the row
Create three records
Until end
Sort
How many times will this be run? Per day/ per year?
What number of rows are there?
Might we save 1 hr / month? 1 min / year? Something will need to read the entire file. Optomize last. Make it work first.
tkx
It should work correctly:
DATA A(keep A);
new_var = 'A';
SET your_data;
RUN;
DATA B(keep B);
new_var = 'B';
SET your_data;
RUN;
DATA C(keep C);
new_var = 'C';
SET your_data;
RUN;
PROC APPEND base=A data=B FORCE;
RUN;
PROC APPEND base=A data=C FORCE;
RUN;
Data A is a result data set.
I have a question about transposing data without using PROC Transpose.
0 a b c
1 dog cat camel
2 9 7 2534
Without using PROC TRANSPOSE, how can I get a resulting dataset of:
Animals Weight
1 dog 9
2 cat 7
3 camel 2534
This is a bit of a curious request. This example code is hard coded for your 3 variables. You will have to generalize this if needed.
data temp;
input a $ b $ c $;
datalines;
dog cat camel
9 7 2534
;
run;
data animal_weight;
set temp end=last;
format animal animals1-animals3 $8.;
format weight weights1-weights3 best. ;
retain animals: weights:;
array animals[3];
array weights[3];
if _n_ = 1 then do;
animals[1] = a;
animals[2] = b;
animals[3] = c;
end;
else if _n_ = 2 then do;
weights[1] = input(a,best.);
weights[2] = input(b,best.);
weights[3] = input(c,best.);
end;
if last then do;
do i=1 to 3;
animal = animals[i];
weight = weights[i];
output;
end;
end;
drop i animals: weights: a b c;
run;
Read the values into 2 arrays, converting the weights from strings into numbers. Use the _N_ variable to figure out which array to populate. At the end of the data set, output the values in the arrays.
I wouldn't give this as an answer to a homework problem that I actually wanted to get a good grade on (because it's far too advanced, so it's obvious you asked for help); but the hash solution is almost certainly the most flexible and what I'd hope someone doing this in the real world would do (assuming there is a 'don't use proc transpose' real world reason, such as available resources). The problem is somewhat undefined, so this is only moderately fault-tolerant.
data have;
input a $ b $ c $;
datalines;
dog cat camel
9 7 2534
;;;;
run;
data _null_;
set have end=eof;
array charvars _character_;
if _n_ = 1 then do;
length animal $15 weight 8;
declare hash h();
h.defineKey('row');
h.defineData('animal','weight');
h.defineDone();
end;
animal=' ';
weight=.;
do row = 1 to dim(charvars);
rc_f = h.find();
if rc_f ne 0 then do;
animal=charvars[row];
rc_a = h.add();
animal=' ';
end;
else if rc_f eq 0 then do;
weight=input(charvars[row],best12.);
rc_r = h.replace();
end;
end;
if eof then rc_o = h.output(dataset:'want');
run;
Do you always have just two rows or is that the no of columns and the rows are dynamic?
If you have a dynamic no of rows and columns, then the ideal way will be to use open function, get the no of columns to a macro variable. This will be the no of rows in your new dataset. Then take the no of rows in your original dataset which will be the no of columns in your new dataset. This must happen before the actual Transpose method. Post this you can read it in to an array and using the macro variables as the dimensions output the values in to the new dataset.
Having said all this, why would you want to re-invent the wheel when you already have the SAS provided ready made transpose function?